DOCUMENT RESUME 



ED 248 264 



TM 840 567 



AUTHOR 
TITLE 



INSTITUTION 
SPONS AGENCY 

REPORT NO 
PUB DATE 
CONTRACT 
NOTE 
PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



Suhadolnik, Debra; Weiss, David J. 
Effect of Examinee Certainty on Probabilistic Test 
Scores and a Comparison of Scoring Methods for 
Probabilistic Responses. 

Minneso*-a Univ. , Mirneapolis. Dept. of Psychology. 

Offic Naval Research, Arlington, Va. Personnel 

and Tr^ ing Research Programs Office. 

ONR/RR-8303 

Jul 83 • 

N00014-79-C-0172 

42p. 

Reports - Research/Technical (143) 
MF01/PC02 Plus Postage. 

♦Confidence Testing? *Multiple Choice Tests; 
♦Probability; Response Style (Tests); *Scoring; 
Scoring Formulas; Secondary Education; Test Format; 
Testing Problems; Test Validity 



ABSTRACT 

The present study was an attempt to alleviate some of 
the difficulties inherent in multiple-choice items by having 
examinees respond to multiple-choice items in a probabilistic manner. 
Using this format, examinees are able to respond to each alternative 
and to provide indications of any partial knowledge they may possess 
concerning the item. The items used in this stisdy were 3C 
multiple-choice analogy items.- Examinees were asked to distributis 100 
points among the four alternatives for each item according to how 
confident they were that each alternative was the correct answer. 
Each item was scdred using five different scoring formulas. Three of 
these scoring formulas were reproducing scorjlng systems. Results from 
this study showed a small effect of certainty on the probabilistic 
scores in terms of the validity of the scores but no effect a* all on 
the factor structure or internal consistency of the scores. Once the 
effect of certainty on the probabilistic scores had been ruled out, 
the five scoring formulas were compared in terms of validity, 
reliability, and factor structure. There were no differences in the 
validity of the scores from the different methods. (Author/BW) 



********************************************************************* 

* Reproductions supplied by EDRS are the best that can be made 

* from the original dociiment. 

********************************************************************* 



U.«. DCPANTMENT OP EDMCATIOMi 

NATIONAL INSTITUTE OF EDUCATION 

>EOUCAriONAt RESOUHCnS INFORMATION 

CENTER (ERlCJ 
J^This oocumonl has bOGrt> rHproduced as< 
locp t> I from thu person oi organiiWiOn 

cngmmmQ it 

Minur chdngiis havu boun mdd9 to improve 
reproduction qud«ity 

• Points t View or opmiOnb Stated «n this dOCU 
metw (1u MOt rtgcessartiy rupresont oHiCiai NiE 
position t»r policv 

Effect of Examinee Certainty pn Probabaistic 
Test Scores and a Comparison of Scoring 
Methods for Probabilistic Responses 



Debra Suhadolnik 
David J. Weiss 



Research Keport 83-3 
July 1983 

Computerized Adaptive Testing Laboratory 

epartment of psychology 
University of Minnesota 
Minneapolis. MM 



This research was supported by fvinds from the 
Air Force Office of Scientific Research, Air Force Human Rasources 
Uhoratory, Army Research Institute, and Office of Naval Research. 

and monitored by the Office of N'ival Research 

Approved for public release; distribution uiilimited . 
Reproduction in whole or in part is pcniiitted for 
any puioose of the United States Government 



Unclassified 



<^tCu*iTy Ct ASSl^^l CATION Of THIS P AQt fWhmi Oaim Et^tmrmd) 



REPORT DOCUMENTATION PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


1 qePOWT NUM0ER 

Research Report 83-3 


2. GOVT ACCESSION NO. 

? 


3 RECIPIENT'^ CATALOG NUMBER 


4 TITLE (9^d SubUtte) 

" Effect of Examinee Certainty on Probabilistic 
Test Scores and a Comparison of Scoring Methods 
for Probabilistic Responses 


». TyPF OF REPORT A PERIOD COVERED 

Technical Report 


« PERFOWIHGORG REPORT NUMfiER 


7 ajthor<*> 

f)ebra Suhadolnik and David J» Weiss 


a. CONTRACT OR GRANT NUM8£Rf«i 

N0001A-79-C-0172 


9 PC^rORMINC ORGANIZATION NAME AND AOOR£S$ 

Department of Psychology 
University of Minnesota 

^^inneannlis Minnpgnt.i S'^AS'i , 


10. PROGRAM ELEMEN PROJECT. TASK 
AREA A IVORK UNIT NUMBERS 

T.A. RR()42-04-01 
Of? • wui'^n— /. '11 


Personnel and Training Reseat Programs 
Office of Naval Research 

Arlingrnn Vlrcrfm'/i 7771 7 ■ 


12. REPORT DATE 

July 1983 


19. NUMBER OF PAGES 

30 • . 


t4 MONITORING AGENCY NAmE 6 ADDRESSCif <fit(ermni /fcwn ControtUnM Oliic^) 


15. SECURITY i.i.A£5 (oi tbim r»port) 


1S«. OECUASSIFICAI ON DOWNGRADING 
SCHEDULE 



"ti DISTRIBUTION STATEMENT foi ihim Rmpori) 



Approved for inihllc release; distribution unlimited. Reproduction in 
whole or in part is pennitted for any purpose of the United States 
Government • 



17 DISTRIBUTION ST ATEMENT (ai ihm mhattmci ^fttmtmd In Block 20, li dHUrwnt irt»m R«f»orfJ 



16 SUPPLEMENTARY NOTES 

[his research was supported by Tunds from the Air Force Office of Scientific 
Kt search. The Air Force Human Resources Laboratory, the Arnw Research 
Institute, and the Office of Naval Research, and monitored hv the n: I ice 
ot Naval Research, 



<9 KEY *OROS (Condnum on rmv^tam ^iOm (f nB€099mr mnd IdmnUiy hy #ttfm^♦f> 

Response formats Reproducing scoring systems 

Test item response formats Conf idence-welght In^; proci^dnres 

ProhahiUstic responses Response style variables in probabilistic 
'^uhjerrive probabiHties responses 



Scorinp m^^h/^/lcl lur prab .a b ili X 4;^ pu a hi^tt - 



20 ABSTRACT (Contlnum an fvmrmm aiam it nmc»m»mrr mntt lOftlty hy Mock numbt) 

The present study «as an attempt to alleviate scwae of the dif f Iculi les 
inherent In muj tiple-choice itens by having examinees respond to n:u;tlple- 
clioice items In a probablxlitlc manner. Using this format, examinery are able 
to respond to each alternative and to provide indications of any partial 
knowledge they may possess 'Concerning the item. The items used In this; '^'udy 
were 30 multiple-choice analogy items. Exaaninees were asked to disrrii,.-ie 100 
points among the four alternatives for each itan according to how confident 



on 

UU » JAN 73 



ERIC 



1473 EDITION OF 1 NOV IS OfiSOLCTC 

S^N 0102-LF.01 4-660? 



Unclassified 



KCUlllTy CUASWriCATlON O*' IMIS PAGE (Wh9n 0»im »ftt»r^ 



lincl.'issif led 



SECUniTV CUfcSSiriCATIOM Of TM«S PAOU (Whma Dmm Baimrm^ 



they were that each alternative was the correct answer. Each itaa was scored 
using five different scoring formulas. Three of these scoring formulas— the 
spherical, quadratic, and truncated log scoring methods— were reproducing 
scoring systeos; The fourth scoring method used the probability assigned to 
the correct alternative as the item score, and the fifth used a function of 
the absolute difference between the coiprect response vector for the four 
alternatives and the actual points assigned to each alternative as the item 
score. Total test scores for all of the scoring methods were obtained by 
summing individual Item scores. 

Several studies using probabilistic response methods have shown the effect of 
a response-style variable, called certainty or risk taking, on scores obtained 
from probabilistic responses. Results from this study showed a small effect 
of certainty v,n the probabilistic scores in terms of the validity of the 
scores but no effect at all on the factor structure or internal consistency of 
the scores. Once the effect of certainty on the probabilistic scores had been 
ruled out, the five scoring formulas were compared in terms of validity, 
reliability, and factor structure. There were no differences in the validity 
of the rcores from the different methods, but scores obtainad from the two 
scoring formulas that were not reproducing scoring systems were more reliable 
and had stronger first factors then the scores obtained using the reproducing 
scoring systems. For practical use, however, the reproducing scoring systems 
may have an advantage because they maximize examinees* scores when examinees 
respond honestly, while honest responses will not necessarily maximize an 
examinee's score with the other two methods. If a reproducing scoring system 
is used for this reason, the spherlca scoring formula is recommended, since 
It was the most internally consistent and showed the stroi^est first factor of 
the reproducing scoring systems. 



Unclassified 



Contents 



Introduction • * - 

Item Weighting Formulas 

Variations of the Response Format of Multlple--Cholce Items j 

Use of Subjective Probabilities with hfciltlple-Cholce Items ^ 

Extraneous Influences on the Use of Subjective Probabilities with 

MultlpIe-KJhoice Items • 

Use of Alternate Item Ty^^s 

Purpose • ^ 

10 

Method • • • 

Test Items ♦ 

Test Administration- * 

Item Scoring « 

Determining the Effect of Certainty 



Evaluative Criteria^ 



Results • » 

Score Intercorrelations 



Discussion and Conclusions « 

- The Influence of Certainty. • 
Choice Among Scoring Methods 
Conclusions 

References,. 

Appendix; Supplementary Tables,. 



ERIC 



14 



14 



Validity and Reliability 

Factor Analyisis of Probabilistic Scores i*> 



18 
18 
21 
22 

23 

26 



Technical Editor: Barbara Leslie Camm 



6 



ERIC 



LfFECT of tXAMINEE CERTAINTY ON PrOBAB I L I S T I C IesT ScORES 

AND A Comparison of Scoring Methods for Probabilistic Kesponses 



Psychometric ians have searched for taany years for a test Item format that 
would allow theo to measure Individual differences on a variable of interest as 
accurately and as completely as possible. The multiple-choice item has proven 
to be a useful tool for assessing kowledge, but there are several problems with 
this Item formal. These problans include the possibility of an examinee guess- 
ing the correct answer, the lack of triforaatlon concerning the process used by 
an examinee to obtain a given answer, and, in general, an Inability to accurate- 
ly determine an examinee's level on a continuous underlying trait based on an 
observable dlchotonous response. ^ 

In attempts to remedy these. problaas and to extract the maximum amount of 
inform.- tian from an individual's responses' to a set of test items. Lord and No- 
vlr- 0968, Chap. 14) have identified thre« important components of interest. 

Tbt^ » components are 

1. The measuresaent procedure, or the manner in which examinees are in- 
structed to respomi to the items. 

2. The Iteia scoring formula. r 

3. The method of weighting each item to form a total store. 

In tlielr attempts to find alternatives to the conventional multiple-choice item 
where the examinee is instructed to choose the one best answer to an item from a 
number of alternatives, investigators have generally focused on one or two of 
these components at a tims. 

♦ 

The various attempts to improve upon the traditio^lal multiple-choice item 
can be classified into three broad categories: (1) attempts to Improve the mul- 
tiple-choice item by using an 1 tea-weighting formula other than the conventional 
unit-weighting scheme, (2) variations of the multiple-choice item that attempt 
to provide more information about an examinee's ability level by asVing the ex- 
aminee to respond to a traditional multiple-choice item in a manner other than 
sln-ply choosing the one best alternative, and (3) the use of item types which 
are completely different from the conventional multiple-clx>ice item, such as 
free-response items. The first category focuses on the third component enumer- 
ated by Lord and Novlck, the item-weighting formula. The second category fo- 
cuses on Lord and Novick'a first two components — the measurement procedure and 
Item-scoring formulas — while continuing to use a unit-weighting scheme to com- 
bine item scores into a total score. The th^rd category focuses priaarily on , 
the measurement procedure' and, to a lesser extent, on item scoring formulas. 

Item-Weighting Formulas 

For many years the accepted method of combining item scores to form a test 
score was simply to sum all of the individual item scores. Since this procedure 
is equivalent to multiplying each item score by an item weight of 1 and then 
summing the weighted itaa scores, the method has been called unit weighting. In 
attempts to increase the validity and/or the reliability of test scores obtained 
by suranlng item scores, many researchers have abandoned unit weighting in favor 
of various forms of differential weighting of individual items. These methods 



ERIC 



7 



- 2 - 



of differential weighting of Items Include multiple regression techniques (Wes- 
man & Bennett, 1959), using the validity coefficient of the Itan as the Item 
weight (Guilford, 1941), weighting items by tne reciprocal of the iton standard 
deviation (Terwilllger & Anderson, 1969), a priori item weights (Burt, 1950), 
and numerous other weighting procedures (Beittler, 1^68; Durtnette & Hogattv 1957; 
Hendrickson, 1970; HoTSt, 1936; Wilks. 1938). 

I 

In reviewing the substantial literature in this area,, Wang and Stanley 
(1970, p. 664) have concluded that "although differential weighting theoretical- 
ly promises to provide substantial gains in predictive or construct validity, in 
practice these gains are often so slight that they do not seen to justify the 
labor involved in deriving the weights and scoring with thea. This is especial- 
ly true when the coiapon«it TCasures are test itens Gulllksen (1950) con- 
cluded, in concurrence with Wang and Stanley (1970), that differential weighting 
is not worthwhile when a test contains more than approximately 10 items and when 
the items are highly correlated. Stanley &im Wang (1970), after concluding that 
differential ttem weighting is not a fruitful venture for test items, have sug- 
gested that the item score be determined by the response made to an item, where 
the examinee is required to do more than just select the correct alternative for 
an item. By changing the mode of response and devising item scoring formulas 
appropriate for these types of responses, the validity and/or reliability of 
test scores might be increased. An additional gain might be more insight into 
the process Involved in responding to test items. 

Variations of the Response Format of Multiple-K^holce Items 

Several of the earliest attempts at modification of the method of respond- 
ing to a conventional multiple- choice item were reported by Dressel and Schmid 
U<i53) in an investigation of various item types and scoring formulas. A con- 
ventional multiple^choice test and one of four "experimental test forms" were 
administered to each subje'.t. The items in each of the experimental test forms 
rtiaembled conventional multiple-choice items In that an item stem and several 
alternatives were provided, but each experimental test form differed frcm the 
conventional multiple-choice format in the following ways: 

1. Free-choice fori nat. Examinees were instructed to choose as many of the 
alternatives provided as necessary to insure that they had chosen the 
correct alternative. This ltec» format was scored rslng Equation I, 
which yields integer scopes that range from -4 to 4 and applies only to 
f Ive-altematlve iteaa: 

Item score = 4C - I I^l 

where C » number of correctly marked alternatives and 
I = number of incorrectly marked alternatives. 

2. Degree-of-cc.Lainty test . Examinees were instructed to choose the one 
"best answer for an item and then to choose one of four confidence rat- 
ings provided to indicate the degree of confidence they h<*d in the an- 
swer they had chosen. This item format was scored as shown In Table 1. 

3. Multiple-answer format . Each itan contained irore than one correct al- 
ternative, and thtf examinees %rere instructed to choose all of the cor- 
rect alternatives. The score for this format was the humber of correct 
alternatives chosen minus a correction factor for any incorrect alter- 
natives chosen. 



ERIC 



s 



Table 1 

, Scoring Sjrstea for Degree-of -Certainty Test 





Itea 


Score 




Correct 


Incorrect 




Ans%rer 


Answer 


Confidence Rating 


Chosen 


Chosen 


Positive 


4 


-4 


Fairly certain * 


3 


-3 . 


Rational guess 


2 


-2 


Ito defensible basia for choice 


I 


-1 



4. Tvo-ahsver format * „ Each iteai contained exactly t%iO correct alterjia-* , 
tlvepy ami the exaalni^es vere Instructed to . Indicate tK>th of the cor- 
rect alternatives • The item score was simply the number of correct 
alternatives chosen. 

In cooparing thesf five test forms (the conventional sultiple-cholce format 
and the four experimental test formats), Dressel and Schmid's (1953) results^ 
showed that the experimental test formats containing more than one correct al- 
ternative (Formats 3 and 4 'above) exhibited greater internal consistency reli- 
ability than the other three test forms, but these test formats also took longer 
to administer than- all of the other formats. All of the experimental test for- 
mats had higher internal-consistency reliability than the conventional multiple- 
choice test except for the free-choice format, Imt the conventional multiple- 
choice format took less time tlwn any of the experimental test formats. Al- 
though the higher reliability coefficients of several of these formats J Formats 
2, 3, and 4) night suggest that these formats aid in introducing more ability 
variance than error variance, the authors warn that the results must be viewed 
with caut Ion, 0 since there were statistically significant differences between the 
groups taking each experimental form on the standard multlple-cVwice test that 
was administered to all of their eubjects; thus-, the differences attributed to 
the effect of test format might be due to systtsaatic ability differences in the 
groups taking each of the experimental test formats. 

•Hopkins. Hakstian, and Hopkins (1973) used 'a confidence weighting procedure 
similar to the degree-of-certalnty test usl^ by Dressel and Schmid (1953) and 
reported higher split-half reliebll^ty coefficients for the confidence weighting 
format than for a conventional multiple-choice teat using the same items. Hop- 
kins et al. (1973) also reported validity coefficients that were correlations 
between the test scores and a short-answer form of the same test. The validity 
coefficient for the conventional test (.70) was higher but not significantly 
different from that of the confidence weighting -fonaat (.67). 

Coombs (1953) felt that examinees could provide more information about the 
degree of knowledge they possessed by eliminating the alternatives which they 
felt were incorrect, rather than by choosing the one correct alternative. Items 
using this format were' scored by assigning one point for each Incorrect alterna- 
tive eliminated and 1 - K points i^en the correct alternative was eliminated, 
where K is the number of alternatives provided. This scoring system yields a 



- 4 - 



range of Integer Iton scores frcai -3 to 3 for a four-alternative multiple-choice 
^Item. 

In comparing tJtlB test format %iith'a conventional multiple-choice test. 
Coombs, Milholland and Wooer (1956) found no differences in validity between the 
two formats for separate tests of vocabulary, spatial visualization, and^ driver 
Inforroation. The validity coefficients used Were correlations between test 
scores and criteria such as Stanford-Blnet IQ, another test of spatial ability, 
and subtest scores fron the Differential Aptitude Test. For- these same content 
• areas, the experimental te^t format yielded higher reliability estimates than 
the conventional test, but the differences between the estimates were not sta- 
tistically significant for any of the content areas. One result In favor of the 
experimental t^:^t format was that the .subjects In the experiment 'felt the exper- 
imental format to be fairer than the conventional format. 

Another variation upon the conventional multlple-ctolce Item Includes a 
self-scoring metliod advocated by Gllman and Ferry (1972), which requires examin- 
ees to choose among alternatives provided until the correct alternative 'is cho- 
sen. Feedback is given after each" choice is made. The item score is simply the 
- number of responses needed to choose the correct alternative; thus, a higher 
score indicates, less knowledge about an it^. Kanie and Molone/ ( 197A) have 
warned that although Gllman and Ferry (1972) found an increase in spllt-half 
reliability using this technique, the effect of using this method on the reli- 
ability of the test dei!>ends upon the ability of the distractors to discriminate 
between examinees of varying levels of ability. An increase in reliability will 
result when the dldtractors possess this abllltjr to discriminate among ability 
levels, but no Increase in reliability will occur If this is not the case. 

Use of Subjective Probabilities withShgtiple-Choice Items 

A modification of the traditional multiple-choice Item that has generated 
much research and interest is the use of examinees' subjective ppobabilities 
concerning the degree of correctness of each a^^ternative provided for an Item as 
a method of assessing the degree of knowledge or ability possessed by the exam- 
inees. By assigning a probability estimate for eacli alternative to an item, 
examinees can indicate degrees of partial knowledge they may have concerning 
each alternative for an item. 

To simplify this procedure for examinees, a number of mettKids hnve been 
devised to aid examinees in assigning their subjective probabilitie^5 to the al- 
ternatives. One method is to ask examinees to directly assign probabilities 
from 0 to 1.00 to each alternative, with the restriction that the probabilities 
assigned to all of tb:i alternatives for each item sum to 1.00. Another method 
Instructs examinees to distribute 100 points among the alternatives for each 
item. The distributed points are then converted to probabilities for scoring 
purposes by dividing the points assigned to each alternative by 100. Some in- 
vestigators have used fewer points for distribution (Rippey, 1970) or symbols,, 
such as a certain nimber of stars, which are to be distributed among the alter- 
natives (deFlnettl, 1965). but the concept is the same. 

Using these types of measurement procedures (sometimes called probabilistic 
item formats or probabilistic response formats), an item scoring formula had to 



0 



be devlVed So tliat examinees* expected scores «>uld be maximized only when they 
responded according to their actual beliefs concerning the correctness of each 
alternative. Itaa-scorlng fcraiaas which satisfy these conditions are called 
reproducing scoring, systems (RSS). b'huford, Albert, aad Massenglll (1966) and 
deFlJettl (1965) provide examples of several RSSs. The RSSs presented by these 
two* authors for use with multiple-choice items that have more than two alterna- 
tives and only one correct answer are the following: 
1. Spherical RSS 

Item score « 



'J {1 4 



where p^ - probability assigned to the correct a'lternatlve 

probability assigned to alternative k^, k - tl, 2, .... m) 



c 
Pk 

2. quadratic RSS 



Item score = 2p - ' (p, ^) HI 
^ k^l ^ - 

3. Truncated Logarlchmlc Scoring System 

fl + log(p )» .01 < p^ 5 l.OOj . 
-1 , 0 <_ p^ 1 .01) 

or a^modlf Ipatlon of this scoring function: 



Item score 



( 12 ,+ iog(p ) /21 , .01 1 p^ 1 1.00 I 

( 0 , 0 1 Pc 1 ' 

The truncated logarithmic scoring system Is technically not an RSS, but it does 
have the properties of an RSS for ptobabilltles between .027 and .973. Accord- 
ing to Shuford et al. (1966), when examinees believe that an alternative has a 
probability of being the correct answer less than of equal to .027, their score 
will be maximized by assigning a probability of zero to that alternative^ Al- 
ternatively, when examinees believe that an alternative has a probability great- 
er than or equal to .973, their expected score will be maximized by assigning a 
probability of 1..00 to that alternative. Shuford et al. (1966) stated that "for 
extrane values of (p|^), some Information about the student's d'egree-of -belief 
probabilities is lost, but from the point of view of applications, the loss In* 
accuracy Is insignificant" (p. 137). Note also that che truncated logarithmic 
scoring function is the only one of the scoring formulas that is dependent only 
upon the probability assigned to the correct alternative. 

Total test scores for examinees are obtained" for all of the RSSs by simply 
summing the individual item scores obtained using that particular scoring farmur 
la. In addition to the conditio.is expressed above for an RSS, deFinettl (1965) 
has stated that the validity of any reproducing scoring system also rests upon 
the following assumptions: 




1. 



2. 



3i 



/The examinees are capable of assigning nimerical values to their sub- 
jective probabilities. 

.The Mamlnees are trained In using the response format and understand 
the itorlng system to be used In scoring the lt«». 
The exaqylnees are ootlvated to do their best on the Items. 



Rlppey (19€i^) reported results from several studies comparing test scores 
obtained using the spherical RSS and the modification of the truncated logarith- 
mic scoring functions with test scores obtained by suanilng dlchotomous (0,1): 
item scores to conventional multiple-choice Itens. In general, he found In- 
creases In fkiy.t's reliability coefficient using .a probabfllstlr response format 
with RSSs under limited conditions. The probabilistic test format produced In- 
creases In test reliability with undergraduate college students but could not be 
used with fourth graders and produced no consistent Increases In reliability for 
tests given to high school freshron or medical studo&ts. There were also no 
consistent tendencies for one or the otber of the scoring foisulas for the prob- 
abilistic response format no produce higher reliability eotsf f Idaits. 

Rlppey (1970) compared the reliabilities of five different methods of scor- 
ing probabilistic item re8{K>n8e6. Three of ' these methods were RSSs; the fourth 
was simply the probability assigned to the correct answer, and the fifth was a 
dichotoajous scoring of the probabilistic responses, which resulted In an item 
score of I if the probability assigned to the correct answer was greater than 
the probability assigned to any other alternative aiKi a score of 0 otherwise. 
The three RSSs ufeed were the modification of the truncated log scoring function, 
the spWlcal RSS, and another RSS called the Euclidean RSS. An Item score us- 
ing the Euclidean RSS Is computed^ using the following equation: 



Item score = 1 - 



161 



where pj^ - probability assigned to alternative j^t " 2, W), and Xj^ - 

criterion gro^tp mean probability assigned to alternative 

Using' Hoyt's reliability coefficient, Rlppey found that the test scopes 
obtained by summing the probabilities assigned to the correct answer yielded 
higher average reliability coefficients (.69) than any of the other scoring 
methods and that the dlchotomous scoring of t*ie probabilistic responses yielded 
theTowest 'average reliability of the five methods (.47), although it was not 
much lower than those of the three RSSs 't.49, .50, and •SB). 

i ^ 

In comparing two RSSs (quadratic and the modification of the truncated log- 
arithmic scoring functions) with conventional multiple-choice test scores, 
Krtehler (1971) found no significant differences between Internal consistency 
reliability coef f lclei\ts for the test scores obtained using the ti#o RSSs and the 
te&t scores from the convemtional mtatiple-cholce items. He |oimd evidence of 
convergent validity for both the probabilistic and conventional it^ formats 
and, on the basis of this eviiience, suggested the use of conventional tests, 
since they are "easier to administer, take less testing time, And do not require 
the tralfilng of subjects in the intricacies of the confidence-marking proce- 
dures" (p. 302). However, his conclusions must be viewed with caution, since 
each of his tests consisted of only 10 items. 



ERIC 



12 



^ 1 - 



Extraneous Influences on the Use of 

Subjective Probabilities with Multiple-Ch oice Items 

Although Koehler's results oay not be generali«ble due J° f % J^^^, "^l"^" 
of Items administered in each format, the use , J^f P",*'*^^!' Jji^, ^'^ifj™' 
has been "-.estioned for other reasons. Hansen (1971 , "Jf^S^^^^^'^i^' ff*^"!"^ 
(1967). Ecnternacht, Boldt. and Sellman (1972). Koehler (1974). and ^ 
BruPza (1974). alon^ with several others, have investigated the possibility that 
the Increase in reliability demonstrated by probabilistic Item formats is due to 
the effect of a personality variable or response style variable rather than a 
more Accurate assessment of knowledge. This variable^has J^^^Jf^i^ 
called risk taking, certainty, confidence, and cautiousness. If ^J^J^J^ % 
feet of this response style variable that leads to increases in reliability for 
probabilistic respondit« over conventional multiple-choice items, this «"ect 
might also explain the fact that the probabilistic item format »«J "^^'Z" J^"^ 
eral. led to increases in the validity of these test scores over that of test 
scores obtained from conventional multiple-choice items. 

Studies investigating the influence of these varloui personality variables 
have shown n ed results. In studies ^ere conventional multiple-choice item 
sc'es a" : ,babllistlc item scores were obtained (Koehler 1974; ^chternacht 

S^Ucnan. Boldt, & Young. 1971), the -O"^^^^?-^^-^ ""M^T.^^^t^'and ^9 to 

have be^n consistently high (.71 to .83 for the Koehler ^^^74 study and .89 to 

.99 for the Echternacht et al. (1971) study). TMs suggests that a 

portion of the variation in the probabilistic test scores can be ^"O""'^^^^ 

by the conventional test scores. Ihe question being posed, though, i« whether 

ti,e variation in the probaHllstlc test scores that cunnot be 

the conventional test scores is reliable variance due to Increased accuracy of 

assessment of knowledge or due to personality or response style variables. 

To determine the Influence of these personality factors. Koehler (1974) 
embedded seven nonsense items in a 40-item vocabulary test ^ 
that they were not to guess the answers to any items on the test. The nonsense 
l^eL wSeltLs with Z correct alternatives. From responses to these nonsense 
itema^ calculated two confidence measures: ^..4™ 
.Cj -Toportion of nonsense itesa attempted under do-not-guess instructions, ■. 

and 



.■Aj.{'..-=)y{'-i) 



[7] 



where m = number of alternatives. 

n " number of nonsense items, and 
p^y - i-robability assigned to alternative 1. on item 

Since the nonsense items had no correct alternatives, an examinee's respon- 
ses to these items were a pure measure of a response style or personality vari- 
able (confidence) that was influencing that examinee's responses. Responses to 
these items were not due to any knowledge the examinee possessed, since there 
were no correct answers to those items. The greater the deviation of these in 
dices from 0. the higher the level of confidence exhibited by the examinee. 

l3 



- 8 - 



Koehler found that both of these confidence Indices were significantly negative- 
ly correlated with three probabilistic test scores (spherical, quadratic, and 
the modification of the truncated logarltlalc scoring functions), but not sig- 
nificantly correlated with the nvmber-correct scores fron the same it ess. The 
nifflber-correct scores also yielded a higher internal consistency reliability 
coefficient than the three probabilistic scores (.85 versus .82, .80, and .74), 
On the basis of these results, Koehler did not recooiaend the use of probabilis- 
tic response formats, since "it would appear ... that confidence responding 
metlwds produce; variability in scores that cannot be attributed to knowledge of 
subject matter" (p. 4). 

Hansen (1971) obtained probabilistic test scores<=^nd scores on independent 
measures of personality factors such as risk taking and test anxiety. Hi devel- 
oped a measure of certainty in responding to probabilistic response formats 
which is essentially the average absolute deviation of a response vector to an 
item from a response vector assigning equal probabilities to all alteytiatives. 
Hansen's study showed that this certainty index was related to risk taking as 
measured by the Kogan and Wallach Choice Dilemmas Questionnaire and authoritari- 
anism as measured by a version of the F-scale, developed fay Christie, Havel, and 
Seldenberg (1958). Howver, the certainty index did not correlate significantly 
with scores on a test anxiety questionnaire or scores on the Gough-Sanford Rig- 
idity Scale. 

These results provide more information concerning the nature of the re- 
sponse style, but there are probleos with Hansen's (1971) certainty index, which 
he attempts to alleviate but does not. The major problem with this index is 
that it is not a pure measure of certainty. This certainty measure is con- 
found oi by an examinee's knowledge concerning an itaa. Hansen attempted to par- 
tial out examinees' knowledge by using their test scores as a predictor in a 
re>;ression equation to obtain predicted certainty scores. Ihese pr^icted cer- 
tainty scores were then subtracted frcm the observed certainty scores to obtain 
a certainty measure free r^t the influence of examinee knowledge. 

Although the rationale is sound, Hansen did not accomplish what he set out 
to do. The test score he used as a predictor was not a pure or even relatively 
pure measure of knowledge. The test scores were probabilistic test scores corn- 
put^ frcm the spherical RSS. This scoring system results In scores that repre- 
sent a confounding of certainty and knowledge. Therefore, by partlalling these 
probabilistic t€*t scores frtm the certainty index, it is unclear exactly what 
the residual certainty index represents, since both knowledge and some, certainty 
have been partial led out. Hansen's results were then based upon the reratlon- 
shlp of various personality variables with a certainty index confounded with 
knowledge, and the relationship of these same personality variables with a re- 
sidual certainty index whose composition is somewhat ambiguous. Hansen's re- 
sults might best be viewed with caution. 

Pugh and Brunza (1974) conducted a study similar to that of Hansen (1971), 
except that they used a 24-ltem vocabulary test and scored it usli% the proba- 
bility ass^ned to the correct answer as the item score. They also obtained 
scores on an independent nonprobabllisticaily scored vocabulary test, and mea- 
sures of risk taking, degree of external control, aiul cautiousness. They fol- 
lowed Hansen's regression procedure to obtain a certainty measure free of the 



U 



- 9 



confounding effects of knowledge and were sore successful than Hansen. They 
used fhe Independent vocabulary test score aa a predictor of the same certainty 
index that Hansen used- and then calculated a residual certainty index by sub- 
tracting the predicted certainty score from the olwerved certainty score* Since 
the Independent vocabulary test was a relatively pure neasure of knowledge, par- 
tialllng its effect irom the observed certainty index restilted in a residual 
certainty index that (1) was a neasinre of the certainty displayed in responding 
to aultiple-choice iteas in a probabilistic fashion and (2) was not related to 
knowledge possessed by exaainees concerning the Iteitf. 

Pugh and Brunza (1974) reported that this residual certainty measure %fas 
not very reliable (.32 internal consistency reliability) and that it correlated 
significantly with riskr-taking scores obtained frcra the Kogan and Wallach Choice 
Dileanas Questionnaire but not with the a^asuxes of cautiousness and external 
control they had obtained. Although this evidence of . the influence of variables 
other than knowledge on probiabilistic test scores might serve as a deterreit to 
the use of these scoring systems, Pugh and Brunsa noted that "there: is no evi- 
dence in either utudy [Pugh & Brunza, 1974, or Hansen, 19711 that these factors 
are more operative than in traditional tests'* (p% 6). 

Echternacht et al. (1971) scored answer sheets of daily quizzes obtained 
from two Air Force training courses using a truncated logarithmic scoring f mic- 
tion and number correct. They found that usli^ the nifflber-correct score, the 
shift of the trainees, and a ntntber of personality variables such as test anxie- 
ty, risk taking, and rigidity as predictors of the probabilistic test scores did 
not account for significantly more of the variation in the probabilistic test 
scores than was accounted for when using only nuaber-correct scores and shift of 
the trainees as predictors. This is evidence that the personality variables did 
not operate to a greater extent in a probabilistic testing situation than in a 
conventional multiple-choice testing situation. 

Thus, these studies show some relationship of probabilistic test scores to 
personality variables (primarily risK?-taklng tendencies); but they also show 
that these influences do not seas to be greater in probabilistic testing situa- 
tions than in conventional testing situations. 

Use of Alternate Itaa Types 

The research reviewed above relied on the multiple-choice item type and 
varied the method of responding to that type of item; however, some researchers 
have advocated the use of entirely different item types, such as free-response 
items, to aid in the assessment of partial imowledge. Some of these alternate 
Item types avoid many of the problems Inherent in multiple-choice Itans but are 
Subject to problems of their own, Fbr example, the free-response Item type 
avoids the problem of randcaa guessing among a 'number of alternatives and has the 
potential to provide a large amount of information concerning what the examinee 
does or does not know, but it is also more tlae-consimilng to fldmlnister and 
score, and may cover much less material than is possible with a multiple-choice 
format. Consequently, if there are any time constraints on testing, fewer Items 
can be administered. Practical problems with scoring many of these alternate 
item types have prevented widespread use of several of them. 



ERIC 




10 - 



P urpose , . . ^ 

Although comparisons of the psychometric properties of multiple-choice 
it ens with several alternate item types are planned, the present research fo- 
cused on coa'oarisons of the probabilistic response format^i This study has at- 
tempted to answer the following questions: 

!• Does a personality variable such as certainty affect probabilistic test 
scores on an ability test to a greater degree than it affects conven- 
tional test scores on the same ability test? 

1. If the effect of d personality variable can b6 discounted, what types 
of scoring systems are best for multiple-choice items on an ability 
test requiring probabilistic responses? 

Method 

Test Items 

Thirty multiple-choice analogy items were chosen from a pool of items ob- 
tained from Educational Testing Service (ETS) containing former SCAT and STEP 
items* Each item consisted of an item stem and four alternatives. The pool of 
items had been parameterized by ETS groups of high school students using the 
c^innuter program LOGIST (Wood, Wingersky, & Lord, 1976) with a three-parameter 
logistic model, resulting in item response theory discrimination, difficulty, 
and guessing parameters calculated, from Ic^rge numbers of examinees for each 
item. The 30 items were chosen from a pool of approximately 300 analogy items 
to represent a uniform range of discrimination and difficulty parameters. The 
parameters for the chosen items are in Appendix Table A* The item discrimina- 
tion parameters ranged from approximately ai« .6 to_a" 1*4, with a mean of .975 
and a standard deviation of .244, while the difficulty parameters ranged from 
approximately b -#5 to ^ " 2»5, with a mean of .961 and a standard deviation 
of .86/. The 7ange of difficuity parameters was not chosen to be sjrmmetric 
about zero because the available examinees constituted a more select group than 
the group whose responses were used to parameterize the items. The guessing 
parameters for these items ranged from c^ ^09 to £ - #38, with a mean of .20 
and a standard deviation of •06« 

Test Administration 

The 30 mu] ti pie- choice analogy items chosen were then administered to 299 
prychology and biology undergraduate students at the University of Minnesota 
during the 1979-1980 academic year. Students received two points toward their 
course grade (either introductory psychology or biology) for their partici- 
pation. Items were administered by computer to permit checking of responses to 
be sure that item response instructions were carefully followed. 

The examinees were instructed to respond to each item by assigning a proba- 
bility to each of the four alternatives. This probability was to correspond to 
the examinee's belief in the correctness of each alternative, with the addition- 
al restriction that the probabilities assigned to all of the alternatives for an 



9 

- 11 - 



item sum to one. Specifically, for each ttes^, the examinees were asked to dis- 
tribute 100 points among the four alternatives provided for each item according 
to their belief as to whether or not the alternative was the correct alternative 
for that item. The total number of points assigned to all of the alternatives 
for an Itea h^ to equal 100. Since the tests mre c(»puter administered, xtem 
responses were stamed Iramediately to cmsure that the re^fxinsea to the alterna- 
tives did indeed sim to 100 (sim».o£ 99 and 101 were also considiared valid to 
allow for rounding). The points assigned to each alternative %«re then con- 
verted Into probabilities by dividii^ the response to each alternative by 100. 

To Insure that the exaainees understood both Ik>w to use the conputer and 
how to respond to the multiple-choice items in a probabilistic fashion, a de- 
tailed set of Instructions preceded each test (see Appeidix Table B). If an 
examinee responded Incorrectly to an instruction^ the cosoputer wuld display an 
appropriate error message on the screen and the examinee would have to re- 
spond correctly before^proceeding to the next screen. If an examinee again re- 
sponded Inappropriately to an instruction, a test proctor was called by the com- . 
puter to provide additional help to the examinee in understanding the instruc- 
tions. Several examples and explanations of methods of responding to probabi- 
listic Items were provided. Examinees, with few exceptions, did not have any 
difficulty understanding how to respond to the items. If, in responding to an 
Item, an examinee's responses did not sum to 99, 100, or 101, the exmalnsc wa** 
Immediately asked to reenter his/her responses until an appropriate sum was en- 
tered. 



Item Scoring 

Hhe item responses obtained frcna these 29$ examinels were then scored using 
five different scoring formtilas to determine ?Aich of these scoring formulas 
yielded the most reliable and valid scores. The five different scoring formulas 
used were: 

1. The probability assigned to the correct alternative by the examinee 
(PACA) was used as the item score. This scoring formula yields scores 
that range from 0 to 1.00. 

?. The second type of item score (AIKEN) was computed from a variation of a 
scoring formula developed by Aiken (1970), *Hiich is a' function of the 
absolute difference bet%feen the correct response vector for an item and 
the obtained response vector: 

Item score « 1 - ~ — 18] 

max 



m 

where D = E 



"1 



Pal - Pel 



(91 



m - ntmber of alternatives, ^ 
Pal ' probability assigned to the alternative by the examinee; 

Pel " ^P^ct** probability for alternative; and 

D - maximira value of D, i^ich was 2.00 for all of these iteas. 
max 

Each correct response vector would contain three O's and one 1, while 



ERIC ' 



- 12 



the obtained response vector muld contain four probabilities that sum 
to UOO. Fbr example* for an itea where the secoid alternative was the 
correct alternative, the correct response vector wuld be 0, 1.00, 0, 0. 
A response vector that might have bec^ obtained for this Item Is «20, 
• 60; .209 0. Fbr this obtained^ response vectot the Item score wuld be 
computed as follows: 

Iten. score - 1 - [10-201 ^ U-O0--60| ^ l0-»20| ^ jo^l] 

L 2.00 J 

an 

- TM ■ 1101 

This scoring formula also yields scores that range frc^a 0 to l.OO. 
3« The quairatlc RSS (QIUU))« Is defined by Equation 3. This scoring formu- 
la yields scores that range from *-U00 to 1«00. 

4. The spherical RSS (SPHER) Is defined In Equation 2. This scoring formu- 
la yields scores that range frcm 0 to 1«00. 

5. A modification of the truncated Ic^arltteslc scorli^ function <TLOG)« 
This scoring formula is a good approximation to the ic^aritlnlc Rss. It 
is a very good approximation throi^tuiut most of tt^ possible score 
range, and Is defined by Equation 5. This scoring formula yields scores 
from 0 to UOO* The actual formula used here tp obtain scores via a 
truncated logarltlnic scoring function utilises a scaling factor of 5 
rather than the usual scalli^ factor of 1 or 2. It was necessary to 
increase this scaling factor to maintain a logical progression of 
scores, since the probability assigned to the correct answer for some 
items was as low as .Ol* ^nce the Ic^ of .01 is -*4.6052« the scaling 
factor had to be a 5 (actually only seme nuisb^ slightly, higher than 
4.6052) in order that the scores progress in an orderly fashion frmi 0 
to 1«00 according to the prolmbillty assigned to the correct answer. 
This alleviated the problem of assigning negative scores to exrainees 
who had assigned very cmiall probabilities to the correct answer iriille 
assigning a score of 0 (a higher score) to examinees who had assigned a 
zero probability to the correct answer# The actual TtOG scoring formula 
used is Equation 11» 

5 + log(p ) 

~ — E , •Ol < p < 1.00 

Item score = { * } 111] 

0 , 0 < p < .01 

c ^ 



Total test scores for all of the scoring methods were obtained by summing all 30 
Item scores for each of the 30 It&os* 

Determining the Effect of Certainty 

To determine the effect of an examinee's certainty or propensity to take 



ERiC* is 



risks when responding to larobabilistic iteas, Ifansen's (1971) certainty index 
was coopui.ed for each exarainee using the following formula: 



n " number of items in test, 
fflj » nuaber of alteraatlves for item^, and 

« probability assigned to alternative 1, of itos j, . 

This certainty Index is a function of the absolute difference between the proba- 
bilities assigned to the fotq: alternatives awl .25, averaged over itess. Since 
the prvibabilities assigned to each alternative are dependent upon both an ex are- f 
Inee's knowledge awl his/her level of certainty, this certainty index Is not a 
"pure" i^asure of certainty, but is confounded with knowledge about the Item. 

To detennine the effect of this response style variable, it was first nec- 
essary to obtain a "pure" iseasure of certainty. This relatively pure measure of 
certainty was obtained by scoring the probabilistic responses dichotomously awl 
then partial ling the effect of this knowledge variable out of the certainty In- 
dices. A dlchotooous test score was obtained from the probabilistic riesponses 
by making the assumption that under conventional "clwose-the-correct-answer" 
instructions, examinees would choose the alternative to which they assigned the 
highest probability under the probabilistic instructions. Thus, for each item, 
the alternative assigned the highest probability by the examinee was chosen as 
the alternative the examinee would have clwsen imder traditional multiple-choice 
instructions- A score of 1 was assigned if that alternative was the correct 
answer and a score of 0 was assigned otherwise. Wien more than one alternative 
was assigned the highest probability, one of those alternatives was randomly 
chosen as the alternative the examinee would have chosen. This procedure at- 
tempted to simulate the decision-making process of an examinee in choosing a 
correct answer to an item. 

This dlchotoraous test score* was used in a regression equation to predict 
the certainty index. -SChe predicted certainty index was then subtracted from the 
actual certainty Index to obtain a residual certainty index. This residual cer- 
tainty index constituted a "pure" measure of certainty. This pure certainty 
index was partlalled out of the probabilistic test scores using the same method 
as that used to partial the dichotcRBous test scores out of the origin-U. certain- 
ty index. The pure certainty index was also used to predict ^ne probabilistic 
test score. The predicted probabilistic test score was then subtracted from the 
probalillstlc test score to obtain a residual probabilistic test score that was 
unassociated with the pure certainty index. 

, As a result of these part4alling operations, the following measures were 
available for each of the five scoring methods: 

1, Probabilistic test score . This score represents a confounding of knowl- 
edge and certainty. 

2. Dlchotomous test score. This score represents a pure knowledge Index 




[121 



where 



certainty ^^ndex , 



- u - 



and Is the dlchotOBOus scoring of Che probabilistic responses. 
3. Residual score, this score is the probabilistic test score with the 

pure certainty index partlalled out, and thus represents the pure knowl*- 

edge component of the probabilistic scores. 
A. Certainty index . This iseasure represents a confounding of knowledge and 

certainty. 

5. Residual c^rr^tnty index . This measure is the certainty index with the 
pure knowledge li^ex (the dichota8X)us test score) partialled out and 
thus represents a«,pure certainty index. 

Evaluative Criteria 

Reliability and validity coefficients were cooputed^for both the probabi- 
listic and the residual test 'scores, the reliability a>efficients were internal 
consistency reliability coefficients calculated using coefflci<mt alpha, the 
validity coefficients were the correlations between test scote and reported 
grade-point average. For each of the five scoring methc^s used, the validity 
and reliability of the residual scores was cosimred with that of the original 
probabilistic test scores. If therei was any difference between tte validities 
and the reliabilities of the probabilistic and the residual scores, they could 
be attributed to the effect of certainty in responding, since the only differ- 
ence betiiraen the t«K) scores was that the effect oi certainty had been removed 
from the. residual adores. " 

Factor analyses of the item scores (both probabilistic and residual) for 
each of the five scoring formulas were performed uski^ a principal axis factor 
extraction method. The nimiber of factors extracted for each of the scoring for- 
mulas was determined through parallel analyses (Horn, 1965) performed separately 
for each scoring formula, using raiMloaly generated data with the same nimbers of 
items and examinees as the real data and with item diiflcultles (proportion cor- 
rect) equated with the real data. Coefficients of congruence and correlations 
between factor loadings for each of the five scoring formulas were computed. 



Results 

Score Intercorrelatlons 

Correlations between probabilistic test scores, residual test scores, di- 
ctKjtoiBOus scores, the certainty index, awl the residual certainty Indax for each 
of the scoring formulas are presented in Table 1. Since the AIKEN scoring for- 
mula resulted in itoi scores and correlations that were identical to that of the 
PACA scoring fdmula, only the FACA results are rqported. 

As exp€»cted, due to the partiallli^ procedure, the correlation between the 
residual certainty index and the dictotomous score, and the correlation between 
the residual certainty index and the residual score, were both sero for all 
scoring methods. The correlation betiraen the original certainty index and the 
dictujtCMBOus pcore (.71), atui the correlation between the original certainty in- 
dex and the residual certainty index (.71), were exactly the saige for all four 
scoring, formulas. Tlils is diw to the fact that the three indices — the original 
certainty index, the residual certainty index, and the dichotoaous score — do not 



ERIC 



0 



- 15 



T&ble 1 

Intercorrelatlons of Scores for Multiple-Choice It ess with a 
Probabilistic Kesponse Fo rmat Scored by Pour, Scoring Methods 

Scoring Method Probabl- Wctet- Besldual Residual 

and Score llstlc obous Certainty Certainty Score 



Probabilistic .94** .64** -.04 UOO** 



OlclKitoaous 
Certainty 
Residual Certal 
Residual Score 





.94** 


.64** 


-.04 


.91** 




.71** 


.00 


.56** 


.71** 




.71** 


-.12* 


.00 


.71** 




.99** 


,92** 


.65** 


.00 


r triangle) and PACA 


(upper triangle) 




.93** 


.83** 


.24** 


.85** 




.71** 


.00 


.43** 


.71** 




.71** 


-.25** 




.71** 




.97** 


.88** 


.62** 


.00 



.94** 
.67** 

-,00 



ERIC 



Probabilistic .93** .83** .24** .97** 

Dickiotomous .85** .71** .00 .96** 

Certainty .43** .71** .71** .68** 

Residual Certainty -.25** .(K) .71** — -.00 

Residual Score 

*p < .05 . 
**p < .01 

change with the particular scoring foroida usei4; they are constant for each In- 
dividual across a<#irlng TCthods. These twa significant correlations, along with 
the significant correlations exhibited for each of the scoring fonaulas between 
the certainty Index and the residual score (.65, .67, .62, and .68 for QUAD, 
SPHER, TLOG, and PACA, respectively), show that the original certainty index is 
Indeed related to both "knowledge** as msured by traditional multiple-choice 
tests Xthe dichotoaous ocores) and "certainty" unconfounded with "knowledge" 
(the residual certainty Index). 

The correlations between the probabilistic test scores and the dlchotorioue 
test scores were .91, .94, .85, and .93 for the QUAD, SPHER, TLOG, and PACA 
scoring nethods, respectively. Using approximate significance tests for corre- 
lations obtained fro« dependent asmplea (Johnson & Jackson, 1959, pp. 352-358), 
all of the pairwise comparisons amoQg these correlations were significantly dif- 
ferent from* each other at the .05 level of significance. Practically, the only 
correlation of these four that appears different from the others is that of TIDG 
(.85 as opposed to .91, .94, and .93 for the other scoring metlwds). Squaring 
these four correlations yields the proportion of variance in the probabilistic 
test scores accounted for by the dictotoaous test scores. The squared correla- 
tions are .83, .88, .72, and .86 for the QUAD, SPHER, TLOG, and PACA scoring 
procedures. 

The correlations between the residual certainty index (the "pure" certainty 
measure) and thfe probabilistic test scores were -.12, -.04, -,25, and .24 for 
the QUAD, SPHER, TLOG, and PACA scoring formulas, respectively. The correla- 
tions for the QUAD and SP^R scoring formulas were not significantly different 
from zero at the .01 level of significance and thus do not account for signifi- 
cant amounts of the variance of the probabilistic test scores. Squaring the 
correlations that are significantly different from «ero results in squared cor- 



21 



relations of .06 for both the TLOG and PACA scoring fonaulas. Thus, certainty 
as measured by the residual certainty Index accounts for no more than 6% of the 
variance of any of the probabilistic test scores* 

^The correlations In Table 1 between the probabilistic test ^scores ami the 
residual scores are very high for all four scoring foraulas (.99, 1.00, .97, and 
•97, for QUAD, SPHER, TLOG, and PACA, respectively) « These correlations are 
highest (.99 and 1.00} for the QUAD and SPHER scoring formulas, whose correla- 
tions between the probabilistic test score aiKi residual certainty Index were not 
sl>;nlflcantly different frcm «ero (-.12^1^ -.04); these correlations squared 
(.98 and 1.00) show that almost all of the variance In the QUAD probabilistic 
test scores, and all of the variance of the SPHER probabilistic test scores, is 
accounted for by the residual scores (representing "knowledge** concemii^ the 
Items) • 

The correlations between thf dlchotomous test scores and the residual 
scores are high and significantly differ&it from zero for all of the scoring 
fomulas (.92, .94, .88, and .96 for QUAD, SPHER, TLOG, AND PACA scoring formu- 
las, respectively). This result Is expected, since both the residual scores and 
the dlchotomous scores are relatively pure measures of knowledge. 

It v^s also expected that the correlations beti^ecai the original certainty 
index and the probabilistic test scores for the various scoring methods would be 
greater than the correlations betiisen this certainty liuiex ai^l the dlchotomous 
scores, since the probabilistic test scores dnd. the original certainty Index 
both represent a confounding of certainty and knowl6dgei i4iile the dlchotomous 
scores are a measure of knowledge less confounded by certainty* this 'occurred 
Only for the PACA scoring metlK>d, which was the only scoring TOthod that was not 
an RSS. The correlation between the certainty index and probabilistic test 
score was significantly greater than the correlation between the dlchot<mous 
score and the certainty Index (.83 vs. 71) for the PACA scoring formula, and was 
significantly less (u^ing the depenilent samples test of significance for corre-* 
latlons and a .05 level of . significance) than .71 (.56, .64 and .43) for the ' 
other three scoring formulas. 

Validity and Reliability ^ 

Table 2 shows the validity and internal consistency reliability coeffi- 
cients for the probabilistic test scores obtained fnm the various methods of 
scoring the multiple-choice items with a probabilistic resfKinse format. The 
validity coefficients were all significantly different fr<M zero but were not 
significantly different fr«a each other, using a dependent samples test of sig- 
nificance for correlation coefficients (Johnson & Jackson, 1959, pp. 352-358) 
and maintaining the experlmentwlse error at a .01 alpha level. 

The reliability coefficients were all significantly different frcmi zero and 
significantly different froa each other (using the Pitman procedure described In 
Feldt, 1980, for testing the significance of differences between coefficient 
alpha for dependent samples using a ,01 significance level). The PACA scoring 
method yielded the highest Internal consistency reliability (.91) followed by 
SPHER (.88), QUAD (.87), and TLOG (.84). 



Table 2 

Validity Oorrelatlons of Test Scores vith 
Reported GFA and Alpha Internal Gonslstency 
Reliability Coefficients for Multiple-^olce Items 
with a Probablfistic Beaponse Fbrwat (N-299) 



Scoring. Validity Reliability 

Method X ?~ 



Dnpartialled Scores 










Quadratic RSS 


.18 


<.001 


.87 


<.001 


Spherical RSS 


.18 


<.O01 


.88 


<.(K)1 


Truncated Log RSS 


.18 


<.001 


.84 


<.001 


PACA 


.17 


<.001 


.91 


<.00l 


Residual Scores 










Quadratic RSS 


.13 


.011 


.87 


<.001 


Spherical RSS 


.13 


.011 


.88 


<.001 


Truncated Log RSS 


.14 


.006 ^ 


.84 


<.0Ol 


PACA 


.12 


.017 


.91 


<.001 



*Probabillt^ of rejecting null hypothesis of no 
significant difference frcm zero. 



Validity and internal consistency reliability coeffici«its for the residual 
scores are also shown in fable 2. The reliability coefficients for the residual 
scores are exactly the sm as the reliability coefficients for the probabilis- 
tic test scores. The validity coefficients for the residual scores wre all 
significantly different frOB zero but not from each other (.01 significance lev- 
el), and these validity coef fifcients were significantly lower (p ^ .05) for the 
residoal scores than for the unpartl ailed probabilistic test scores (.18 vs. .13 
for QUAD, .18 vs. .13 for SPHER, .18 vs. .14 for TLOG, and .17 vs. .12 for' 
PACA). This decrease in the sagnitude of the validity coefficients of the re- 
sidual scores is not due to a restriction in range problaa, since the range of 
scores for the probabilistic test scores was very similar to that of the residu^ 
al scores, as is shown in Table 3^ 



Table 3 

Range of Scores for Probabilistic and 
Residual Test Scores 



Scoring 




Residual 


Method 


Probabilistic 


Quadratic 


27.21 


27.30 


Spherical 


16.57 


16.56 


Truncated Log 


13.14 


12.74 


PACA 


20.69 


20.10 



- 18 - 



Factor Analysis of .. Probabilistic Teat Scores 

Factor analyses of the impartlalled probabilistic and residual test scores 
yielded virtually Identical results; therefore, only the results of the factor 
analyses of the probabilistic test scores are reported here. 



Figures la to Id show the results of the parallel analyses performed for 
each of the scoring nethods (numerical data are in Appendix Table C) • The ei- 
genvalues obtained f roa the principal axeb factor analysis of the random data 
were all low; as eipected, no factor accounted for significantly more variation 
in the items than any other factor. In ccMqwrli^ the eigenvalues of the actual 
data with those from the random data, it is clear that one strpog factor is pre- 
sent for all of the scoring methods. A second factor also appears for each of 
the scoring TOthods with eigenvalues greater than that of the second factor for 
the random data, but the eigenval«» for the second factors of the random and 
actual data are so close that the secomi factor (and third factor for tUKf) for 
the actual data can be considered to be the same strength as a random factor. 
On the basis of these results, one-factor principal axis factor solutions were 
obtained for each of the scoring metl^s and are shown in Table 4. 

The factor loadings in Table 4 are positive and fairly high for all items 
and all scoring formulas, indicating a global factor for each of the scoring 
methods. The magnltwies of the eigenvalues sl^ that this factor accounted for 
more of the variance of the itaa responses for the PACA scoring formula (26%) 
than for any of the other scoring formulas (19.9%, 20.9%, and 17.4% for the 
QUAD, SPHER, and TLOG scorii^ formulas). 

The correlations betwem factor loadings across the 30 items for the vari- 
ous scoring methods are presCTted in the lower left triangle of Table 5, while 
coefficients of congruence are reported In the upper right triangle of Table S. 
The coefficients of congruence are at the maxlmioi of 1.00 for all of the pairs 
of factor loadings and the correlations arnoi^ all of the factor loadings are 
very high, except for the correlation between the factor loadings for the PACA 
and TLOG scoring methods,' which was only .80. The fact that all of the coeffi- 
cients of congruence are equal to the maxirata valjue for this ind««x is due to the 
dependence of this index upon the magnitude and sign of the factor loadings. 
Gorsuch (1974, p. 254) twtes that this index will be high for factors whose 
loadings are approximately he saae size csven if the ji^ttern of lowiings for the 
two factors is not ' he same. 



The evidence concerning the effect of examinee certainty on probabilistic 
test scores suggests that certainty as a response style variable has a small, 
almost negligible effect, on the probabilistic test scores obtained in this 
stu**/. The reliability coefficieits for the five scoring methods were exactly 
the same for the probabilistic and residual test scores, indicating that the 
certainty variable was not contributing reliable variance to the probabilistic 
test scores and was artifically increasing the reliability coefficients. The 
factor structures of the probabilistic test scores and the residual test scores 



< 



Discussion and Conclusions 



The Influence of Certaint 




ERIC 



6-1 



4- 



Figure I • 
Eigenvalues from 'Parallel Analysis of Rimlon Data 
and Actual I^ta for QUAD, SPHER, PACA, and TLOG Scoring Methods 



(a) QUA0 



• • QUAD 

• • Random D<ito 



0*1 — I — I — I — J — I — I — I — I — r— J — 1 — I — ' — I — I 
0 \ 2 3456 78910ni21514& 

foclor Number 



(b) SPHER 



6-1 



4- 



-• stm 



• • RondMnDcib 



0-t — i — I — I — I — I — rri — I — i — 1 — r— I — t — r-n 
fi 1 2 3 4 5 6 7 8 9101iaiSM« 

: Tacter Number 



I 



S-i 



6- 



4- 



2- 



(c) PACA 



• • Rmdom Dolo 



0-i — I — I — r — I — r — I — I — r — » — i — i i » 
6 1 2 i 4567SwK;n i2l5Hl5 

fcjcter Number 



S-i 



5- 



4- 



u 



2- 



(d) TLOG 



Ranskm Date 



«... 



On — I — I — I — r— 1 — r — i — i — r 
0 1254567?*^ 



-I — I — r — I — I — I 
K H 12 13 U 15 



raciest fianber 



ERIC 



25 



Factor LoaiiiigB on Che First Factor 

for Multiple-Choice Itaas irlth a 

Probablllatlc Respomte Fbtaat 
«_ , ^ 



Xtoa 




Scoring Method 


1 
1 


Hxmhet 


QUAD 


SPHER 


PACA 




I <i 


• 418 


• 433 


• 382 


• 49Qi 


2 


.446 " 


•438 


All 
•412 


•493 


3 


. •*39 


•456 


• 409 


• 476 


4 


.439 


•435 


.358 


«526 


5 


,233 


.264 


• 165 


• 347* 


6 


.429 


• 443 


.396 


c* A n 

• 528 


7 


.932 


.158 


.316 


• 412 


8 . 


.424 


.428 


.413 


• 505 


9 


.324 


.354 


.259" 


-469 * 


10 


• 426 


k4I4 


.391 


.500 


11 


.383 ^ 


.377 


.355 


.445 . 


12 


.538 ' 


.529 


.509 . 


.585 


13 


.513 


.513 


• 519 


.566 


' 14 


.444 


•441 


.422 


.483 


l** 


.368 


• 384 


• 341 


.414 


16 


. .465 . 


• 512 


•469 


• 543 


17 


.543 


.537 


.487 


• 586 


.18 


.505 


•484 


.546 


• 509 


• 19 


.316^ 
.483 


.338 


.244 


• 445 


20 


.490 


.492 


en*) 
• 50/ 


21 


.552^ 


.552 


.491 


• 597 


.22 


.544 


.571 


, .518 


• 624 


23 ^ 

24 ^ 


. .498 - 


.503 


.463 


V ^527 


.472 


.505 


.394 


• 553 


25 


.400 


.422 


.380 


• 466 


■26 


.437 


.466 


.406 


• 517 


27 


.514 


.505 


• 508 


.520 


28 


.524 


.515 


.473 


• 571 


29 


.406 


.423 


.349 


•488 


30 


.387 


.453 


.370 


• 514 


Eigenvalue 


5.98 . 


6.27 


5.22 


7.81 



Table 5 

CdVrelatlone (Lover Trlaagle) and Coefficients 

of Congruence (l^per Triangle) BetiM^en 
Factor Loadings Obtained for Fbur Scoring Mettoda 



Scoring 



Method 


QUAD 


SPHER 


TLOG 


PACA 


QUAD 




KOO 


KOO 


1.00 


SPHER. 


.97 




1.00 


1,00 


TLOG 


• 95 


• 92 




1.00 


PACA 


• 90 


• 93 


: .80 





f 

- 21 - 



were also identical. -Che factor stricture and internal cottslatency reliability 
data (which are both based upon tl» intexitea coAelatlons for each scoring 
aethod), indicate ik> effect of the c^tainty variable on probabilistic teat 
scores above and beyond the effect on tlw r«iid«*al "teat scores (i.e.* tte proba- 
bilistic test scores with the "paeT certainty' index partialled out). This lack 
of effect is denonstrated by the extroael!^ bigh correlations between the scores 
derived assumli^ conventional multiple-choice instructlcms (the dichotoaous 
score) and the probabilistic test scorea for all of the scoring methods, studied, 
and by the extremely low. correlations betwecoi the "pure" certainty iindex (the 
^ residual certainty index) and the probabilistic test scores for each scoring 
net hod. Since the dichotorous test scores simulate teating conditions under 
conventional multiple-choice instructions to choose the one correct anawer, 
these' high correlations 8t«gest that the greatest portion of the variability in 
the probabilistic test scorce tor. all of the scoring foxmulaa is not different 
from that present in scores obtained with traditional multiple-choice, tests. 

The validity coefficients did show an effect of the certainty inxtex on the 
probabilistic test scores. The significant decrease in the validity coeffi- 
cients Which occurs when the "pure" certainty index is partialled from the prob- 
abilistic test scores is evidence of some effect of the certainty variable on 
the probabilistic test scores. However, even though the decreaae was signifi- 
cant for all of the scoring formulas, the practical differeice was small. .The 
validity coefficients of the probabilistic. test scores were all low initially, 
since the reported CPA criterion is a complex variable not eaaily predicted by a 
single factor of analogical reasoning. Although reported CPA night not have- 
been a true reflection of actual CPA (although Thcs^son and Weiss, 1980, data 
show u correlation of .59 between thte tw), this invalidity should not have af- 
fected the comparisons made in this study. Additional research utllieing dif- 
ferent criterion measures is recommeiuied to further investigate the generality 
of the results obtali^ed here. 

» 

Other than the sraa^-l effect of"the certainty variable on the validity coef- 
ficients for each of the dcorli^ formulas, there appears to be no effect of the 
certainty vcrlable\on the probabilistic test scores. However, since not all of^ 
the variance in th*^ probabilistic test scores csn be accoimted for by the "pure" 
knowledge anct certainty K^dlces , there aay be some other resronse style variable 
that exerts an influence upon the probabilistic test scores. This influence 
would have to be extreaely small, -though, since the knowledge and certainty in- 
dices accounted for 88X, fif4X, 78X, and 92% of the variance in the scores ob- 
tained ftOTi the spherical, qu^ratic, truncated 1<^, and PACA scoring formulas, 
respectively. 

m 

Cj>olce among Scoring Methods 

I 

The choice mot^ the five scoring methods must be made on the basis of va- 
lidity coeffld^ts, the reliability coefficients, and the factor analysis re- 
sults. Since there were no Klgnif leant differences between any of the validity 
coefficients, these coeffldeits do not provide support for any one scoring 
method. In terms of the reliability coefficients, the PACA (and its equivalent 
AIKEN) scoring formiaa yielded acores having the highest rellabiiity coeffi- 
cients of all of the scoring Kthods. 



ERIC 



27 



N 



- 22 - 



The dependence of both the Internal consistency reliability coEfficient and 
the one-factor solution on the interiten correlation suggests that scores from 
the scoring fonaulas with the highest reliability coeff icietits muld also havc^ 
the stroi^est first factors, and this is exactly lAiat occurred in this study. 
Hypothesizii^ that the factor extracted represcsits verbal ability, it is desir- 
able, that this factor account for as large a proportion of each item* a variance 
as possible. The factor contrilmtion of this first factor was greater for the 
^ two scoring methods that are not reproducing scoring sjrsteriis (PACA and AIKEN) 
than for the three scoring iMthods that are reproducii% scorii^ systcns. 

On the basis of these results* either the PACA or Aiken scoring methods can 
be recommended for use with aultlple*^holce itew with a probabilistic response 
foraat« Since PACA is the simplest of the two methods. It liaight be the prefera- 
ble scoring method* 

4 

Conclusions 

Test scores obtained from the five methods of scoring multiple-choice items 
with a probabilistic response format do not appear to be affected by the re- 
sponse style or personality variable of examinee certainty to a greater degree 
than scores obtained under traditional multiple-choice ins t r wrt ions The scor- 
ing method used does not affect the validity of the test scores but does appear 
to affect the internal consistency of the scores. Test scores obtained, using 
the PACA scoring metlK>d vmre more reliable, simpler to ccmpute, and as valid as 
those obtained from the other scoring methods; therefore, use of the PACA scor^ ^ 
Ing method Is reccmmended for these types of items* 

As a note of caution, howler, one of the three reproduclf^ scorii^ systems 
might have a practical advantage over either the PACA or AIKEN scoring .formiflas. 
In 3 situation where examinees were aware of the scoriqg formula to be usdd and 
where the scores were of some importance to the examinee (as for a classrotw 
grade or selection procedure), the examinees could optimise their test score 
using the reproducing scoring systems only by ritstHimiing according to their ac- 
tual beliefs in the correctness of each alternative, while their total scores 
could be maximized with the PACA scoring formula by assigning the maximum proba- 
bility of 1#00 to the one alternative they tlKiu«ht %»s the correct one# If ex- 
aminees were expected to utilise this strat^y, one of the reprcNiuclng scoring 
systems would be better to use with multiple-choice ltes» with a probabilistic 
response format. Test scores obtained from the spherical reproducing scoring 
system were rare reliable, as valid, Btd showed a strongs first factor than 
scores from the other reproducing scoring systems* Thus, If the practical situ- 
ation requires use of a reproducing scorii^ system, the spherical RSS should be 
used. 



2S 



ERIC 



References 



Aiken, L. R. Scoring for partial ionowledge on the generalized x^rrangeaent 
item. Educational and P^chological tfeasureaent , 1970, 30, 87-94. 

Rentier, P. N. Alpha-naxialsed factpr analysis: Its relation to alpha and 
canonical factor analysis. Pay'chowetrika . 1968, 33, 335-345. 

Christie, R. , Havel, J., & Seidoiberg, B. Is the F scale irreversible? Journal 
of Abnoraar and Social Psychology . 1958 , 56, U3-159. 

Cooabs, C, H, On the use of objective exminations. Educational and Psyclrolog- 
ical Measurewent , 1953, U,, 308-31 0,' 

Cooabs, C. H. , MLlholland, J. E. , & Vfraer, F. B. llie ^sement of ' partial 
knowledge. Educational and Psychological Measurement . 1956, 16^, 13-37. . 

de Finetti, B. Mettods' for discriainating levels of partial knowledge concern- 
ing a test iten. British Journal of Mathematical and Statistical Fsycholo- 
£1, 1965, 18, 87-123. 

Dressel, P. U , & Sctasid, J. Some BK)dif ications of the multiple-choice item. 
Educational and Psychological MeasurCTent . 1953, JJ, 574-595. 

Dunnette, M. D. , & Hogatt, A. C. Deriving a composite score from several mea- 
sures .of the same attribute. Educational and Psychological Measurement , 
1957, 17_, 423-434. 

Echternacht, G. J., Sellman, W. S. , Boldt, R. F. , & Young, J. D. An evaluation 
of the feasibility of confidence -testing as a diagnostic aid in technical 
training (RB-71-51). Princeton NJ: Educational Testing Service, October 
T97T: 

Echternacht, G. J., Boldt, R. F. , & Sellman, W. S. Personality influences on 
confidence test scores. Journal of Educational Measurement, 1972, 9^, 
235-241. 

Feldt L. S. A test of the hypothesis that Cronbach's alpha reliability coeffi- 
civ.it is the same for *'vo tests »iministered to the same sample. Psycho- 
met rika , 1980, 45, 99-105. 

Gllman, D. A., & Ferry, P. Increasing test reliability through self-scoring 
procedures. Journal of Educational Measur«!«nt , 1972, ^.t 205-207. 

Gorsuch, R. L, Factor analysis . Philadelphia: W. B. Saunders Company, 1974. 

Guilford, J. P. A simple scorii^ might for test itoas and its reliability. 
Psychometrika . 1941, 6^, 367-374. 

Gulliksen, H. Theory of mental testa. New York: Wiley, 1950. 

Haitsen, R. The influence of variables other than knowledge on probabilistic 



\ 



2A - 



tests. '^Journal of Educational Measurement , 1971, 8^, 9-14. 

Hendrickson, G. F, An assessment of the effect of differential weighting op- 

tlono within Items of a multiple-choice objective test using a Guttman-type 
weighting scheme . Unpublished doctoral dissertation. The Johns Hopkins 
University, 1970. 

Hopkins, K. D. , Hakstlan, A. R. , & Hopkins, B. R. Validity and reliability .con- 
sequences of confidence weighting. Educational and Psychological Measure- 
ment , 1973, 33, 135-141. 

Horn, ^. L. ^A rationale and test for the number of factors in factor analysis., 
Psychometrika . 1965, 30, 179-186. 

Horst, P. Obtaining a composite measure from a number of different measures of 
the samft attribute. Psychometrika , 1936, ^, 53-60. 

Jacobs, S. S. Correlates of unwarranted confidence in responses to objective . 
test items. Journal of Educational Measurement , 1971, _8, 15-20. 

Johnson, P. O. , & Jackson, R. W. Modern statistical method6; Descriptive and 
inductive . Chic^go^ Rand McNally & Co., 1959. 

Kane, M. T. , & Moloney, JJ M. The effect of SSM grading on reliability when 

residual items have no discriminating power . Paper presented at the annual 
meeting of the National Council on Measurement in Education, April 1974. 

Koehler, R. A. A comparison of the validities of conventional choice testing 

and various coi^fidence marking "procedures. Journal of Educational Measure- 
ment , 1971, 8, 297-303. 

Koehler, R. A. Overconf Idence on probabilistic tests. Journal of Educational 
Measurement , 1974, H^, 101-108. 

Lord, F. M. , & Novlck, M. R. Statistical theories of mental test scorea . Read- 
ing MA: Ad dl son-Wesley, 1968. 

Pugh, R. C. , & Brunza,>J. J. The contribution of selected personality traits 

and knowledge to response behavior on a probabilistic test . Paper present- 
ed at annual meeting of the American Educational Research Association, 
Chicago XL, April 1974. ' 

Rippey, R. Probabilistic testing. Journal of Educational Measurement , 1968, 5^, 
211-216. 

Rippey, R. M. A comparison of five different scoring functions for confidence 
tests. Journal of Educational Measurement , 1970, 2.* 165-170. 

Shuford, E.H. , Albert, A., & Massengill, H.E. Admissible probability measure- 
ment procedures. Psychometrika , 1966, 3^, 125-145. 

Slakter, M. J. Risk taking on objective examinations. American Educational 

30 



Research Journal , 1967, £, 31-43. 

Stanley, J. C. , & Wang, M. D. Weighting teat Iteas and test-itea options: An 
overview of the analytical and enpiricaX literature. Educational and Psy- 
chological Measurement , 1970, 30, 21-35. 

Terwllllgcr, J. S. & Anderson, D. H. An eaplrical study of the effects of 
standardizing scores in the fozn^tion of linear co^sites. Journal of 
Educational Measureaaent , 6^, 1969, 145-154. 

ThoBpson, J. C, & Weiss, D. J. Criterionrrelated validity of ad aptive testing 
strategies (Research Report 80-3). Minneapolis MNs ttiiversity of Ittnneso- 
ta. Department of Psychology, Psychometric Methods Program, Computerized 
Adaptive Testing Laboratory, Jme 1980., 

Wang, M. W., & Stanley, J. C. Differential welshting: A review of methods and 
empirical studies. Review of Educational Research , 1970, 40, 663-705. 

Wesman, A. G,, & Bennett, G. K. Multiple regression vs. simple addition of 

score? in prediction of college grades. Educational and Psychological Mea- 
surement , 1959, _19. 243-246. 

Wilks, S. S. Weighting systeois for linear functions of correlated variables 
when there is no dependent variable. Psychometrika , 1938, 2» 23-40. 

Wood, R. L, , Wingersky, M, 8. , & Lord, F, M. LOGIST; A cow>uter p rogram for 
estimating examinee ability and item characteristic curve parameters 
(RM-76-6). Princeton N J: Educational Testing Service, 1976. 



31 



/ 



- 26 



impend Ix: 
Suppleaentary Tables 



Tahle A 
IRT Iteo Par^BeC«rs for 
Miatiple-Cbolce Analogy Iteos 



Item 
Nuaber 




b 


£ 


310 


.616 


> 

-.483 


.20 


273 


.627 


2.062 


.20 


275 


.652 


1.617 


.21 


286 


.673 


2.407 


.09 


327 


.693 


1.129 


.22 


399 


.722 


.446 


.24 


419 


.750 


2.413 


.16 


278 


,770- 


2.002 ' 


.17 


266 


.815 


1.6TO 


.38 


271 


.828 


1.266 


.09 


268 


.844 


1.036 


.1.7 


392 


• 865 


-.360 


.20 


492 


.914 


-.145 


.12 


331 


.930 


1.352 


.20 


578 


.946 


.271 


.20 


405 ' 


.983 


.739 


.16 


323 


U005 


.828 


.20 


394 


1.006 


-.153 


.20 


277 


1.041 


1.930 


.17 


335 


1.075 


1.525 


.20 


575 


1.098 


.197 


.25 


560 


1.132 


-.D07 


.27 


452 


1.156 


-. 341 


.30 


493 


1.172 


.076 


.26 


576 


1.211 


.633 


.20 


415 


1.234 


1.183 


.24 


322 


1.232 


.960 


.17 


250 


1.288 


.513 


.17 


284 


1.357 


2.232 


.24 


339 


1.608 


1.818 


.17 


Mean 


.975 


.961 


.20 


SO 


.244 


.887 


.06 



• 



- 27 - 



Table B 

Instructions Given Prior to Adnlnistratlon . of Multiple-Choice 
Itess with a Probabilistic Response Fbraat 



Screen 29891* 

That cGBpletes the introductory Information, 

Type "GO" and press "RETURN" for the instructions for 
the first test. 

Scre&t 29842* 

This is a test of word knoifledge. It is probably different 
from other teits you have taken, so it is iaportant to read 
the instruct! na carefully to understand how to anstrar the 
questions. 

Each question consists of a pair of words that have a specific 
relationship to each other, followed by four possible answers 
• consisting of pairs of words. Oae of these four pAlrs of 
words has the same relationship as the first pair of wrds. 



type "GO" and press "RETURN** for an exaaple. 

Screen 29824* 
For exaaple: 

Hot:Cold 

1) Hard:Soft 

2) iforse: Building 
3} Mule: Horse 

4) Yellow: Brown 

Your Job in this test la not to choose the correct answer / 
(the pair of words that has the same relationship as the Hrst 
^pair of words) but to indicate your confidence that each Of 
the four answers is the correct answer. 

Type "GO" and press "RETURN" to continue the instructions. 
Screen 29804* 

You indicate your confidence by distributing 100 points 
anoi^ the four answers. The answer you think is the 
correct one should get the highest nimber of points, and 
the answer you feel is least likely to be the correct ansvrer 
should get the lowest moaber of ^Ints. 

The EK>re certain you are that an answer is 0e correct one, 
the closer your reaponse to that answer sh^tild be to 100. 
The twjre certain you are that an answer 1* NOT the correct 
one, the closer your response for that answer should be to 0. 



-continued on the nejct page- 



ERIC 



33 




Table B, continued 
Instructions Given Prior to Administration of Nultlple-KSiolce 
It ens with a Probabilistic Response Fbnaat 



If you are c(»pletely cesrtaln that one of the Answers Is the 
correct ansmr, assign 100 to that answer and 0 to the other 
answrs for ttut qiMstlon. If you are ccMqtletely uncertain as 
to uhlch answer Is correct, assign 25 to each of the four 
answers. 

Type "GO** and press **]RETllR{r to continue. 
Screen 29805* 

The numbers you b. strlbute aaot^ the four answers rtust stsa to 
99 or 100. However, you can dlstrllmte the 100 points In any 
way you like, as loi^ as the^ reflect your certainty as to the 
"correctness" of each answer. 

To ansiper a question, type the numbers' you assign to each 
answer In a line In the order In which the uiawers appear In 
the question. Separate each nipber by a coosa. 

Type '*G0'* and press "RETURIT for an example. 

Screen 29825* 

Going back to the ample question: 
Hot: Cold 

1) Hard:Soft 

2) House: Building 

3) Nule:Hor8e 

4) Yellow: Brown 

Suppose a person responded with the following numbers: 
? 80,0,0,20 
This person ms; 

a) fairly stire, but not completely certain, that 
the first answer (Hard: Soft) had the same 
relationship as the pair of words In the 
question and thus uas the correct answer. 

b) completely certain that answers "2" axki "3" 
were NOT the correct choice. 

c) unsure about whether or not the fourth answer 
was the correct answer, but felt that it was 
closer to being an Incorrect answer than the 
correct answer. 

Note that 80 0 + 0 + 20 * 100. 

Type "GO** and press '*R£TURN'' to continue the instructions. 



-continued on next page- 



S4 



- 29 - 



^ Table B, continued 

Instructions Given Prior to Adainlstratlon of MultipleKEholce 
Itaas with a Probabilistic Response Bormat 

Screen 29826* 

Let's look at this question once nore: 

Hot: Cold 

1) Hard: Soft 

2) House: gilding 

3) MAle;Horse 

4) Yellow: Brown 

Suppose a person responded with the following nuabers : 
? 33^0,^3,33 

Itiis person was: 

a) ccopletely certain that the second answer was NOT the 
correct answer. 

b) unsure as to t^lch of the resaining answers ms correct 
ai^ felt that any of the re^lnii^ three ansirairs were 
equally likely to be the correct answer. 

Type "GO" and press ''RETURN*' to continue^ tte Instructions. 

Screen 29827* 

As you can see, tli^re is an almost endless variety ^f 
coabinatlons of ntabers that you say use to state your 
confidence in the four possible answers. Ose the eitire 
range of nuabers between 0 and 100 to express yotu: 
confidence. RemoBber also that the ntmbers you assign to 
the foul: answers «ist sub to 99 or 100. 

Please ask the proctor for help if you have any questions.' 

Type "GO" and press "RETURN" whai you ere ready to start 
the test. ' ' 

*Thls line Is for Identification only and was not displayed. 



o 

ERIC 



35 



- 30 - 



Xable C 

Eigenvalues for the First Fifteen Principal Flsctors 
of Beal and Raad«B Dita for Each Scoring Method 



(mp SPttBR TLOC ■ PACA 



Factor 






Real 


KBiraoa 


Seal 


Rand OB 


Real 


Kandon 


1 


6.38 


1.01 


6.67 


1.00 


5.65 


1.02 


8.16 


1.04 


2 


1.23 


.96 


1.23 


.96 


1.36 


.95 


1.32 


.96 


3 


.98 


.93 


.92 


.94 


1.21 


.94 


.96 


.95 


4 


.93 


.89 


.91 


.90 


.97 


i90 


.80 


.89 


5 


.84 


.82 


.81 


.83 


.89 


.83' 


.71 


.83 


6 


.74 


.79 


.72 


.80 


.8.1 


.79 


.65 


.81 


7 


.6S 


.68 


.71 


.68 


.73 


.69 


.60 


.69 


8 


.67 


.66 


.66 


.67 


.72 


.68 


.56 


.67 


9 


.63 


.64 


.61 


.63 


.63 


.64 


.55 


.65 


10 


.5^ 


.59 


.55 


.61 


.58 


.59 


.50 


.61 


11 


.4/ 


.57 


.47 


.57 


.49 


.57 


.45 


.57 


12 


»44 


.53 


.43 


.53 


.47 


.53 


.42 


.53 


13 


.41 


.47 


.42 


.48 


.42 


.48 • 


.38 


.48 


U 


.40 


.44 


.39 


.43 


.42 


.44 


.36 


.44 


IS 


.38 


.41 


.35 


.40 


.39 


.41 


.30 


.41 



CI 



ERIC 



Distribution Ust 



Navy 

I llafaon ScIquCUc 
Office of llAvsl ReMareh 
Branch Office, Londofi 

Fro Mew tork, WT OfSlO 

I tc, 4le«afider Bory 
Applied PeycNilogy 
HeacureMRt Dlviaioa 

HAS Pefiaai:oia« Ft 321109 

1 Dr. Stanley Co; Iyer 

Office of lUvAl T«cte«il««y 
800 N. Qutncy SctMt 
Arlington, VA 22217 

I CDR Nike Currao 
Office of Naval Reeearch 
aoo N. Quincy St* 
Code 170 

Arllti^too, VA 22217 

1 Mike Durveyer 

InaCructlonai Progiw Oevelopoent 

Building 90 

IfET-PDCO 

Great Ukea WTC, XL 600M 



I Or Wllllan Hoairague 
mmX Code 13 
San Mago, CA 921 » 

i Bill liordbrock 
1032 FalrlMi Ave. 
Ubertyvllla, IL 6004S 

1 Library, CcNle F20a 
Itovy Farsoiuu!! R&6 Center 
San Megot CA 92132 

1 Tectoical tHrector 
navy Feraonnel 8&0 Center 
gen Diego, CA 92152 

6 Cooaadlng Officer 

Naval BeMierck Laboratory 
Code 2627 

Waablogton, DC 20390 

1 Peycboioglcel Scloncea Divlalon 
Code 442 

Office of Haval Beeearch 
Arlington, fA 22217 

6 Peraotinel 4 treining Beeearch Croup 
Code 442PT 

Office of Ravel Beaearch 
Arlington, VA 22217 



i UK* rmi ri^imi^iMi 
Code Pn 
HPROC 

San Diego, CA 92152 

t Dr« Cathy F«ratfftdea 
Navy Persiianel R4D Center 
Sen Diego, CA 92152 

S Mr. Paul Foley 

Savy Perflunn«<l il4D Center 
San Diego, CA 92152 

1 Dr« John Ford 
Mavy Personnel R&D C«ntt$r 
San Diffgo, CA 92152 

1 Dr< Mormon j. <err 
C^icf of Maval Technical Training 
Maval Air Station NM^ia (75) 
Mllllngtoa, TM 38054 

t Or. Leonard Kroeker 

Navy Personnel R&D Center 
Sen 01«go, CA 92152 

1 Dr. VlUiaa L. Maloy (02) 
Chief of Maval Education and Traloii^ 
Mevnl Air Station 

Penadcols, Ft 32!0g 

1 Dr. Js«e8 McBrlde 

Havy porsotmel R4D C«ntt»r 
San Diego, CA 92152 

I Cdr Ralph McCunber 

Director, Beeearch 4 AnaXyela Dlvlelon 
Mavy Becniidng Connand 
4015 Wllaon Boulevard 
Arlington, VA 22203 

1 Dr. George Noeller 

Director, Behavioral Science^ Dept. 
Havel Subaarlne Medical Beaearch Lab 
Naval Suboarlne Baae 
Croton, CT 63409 



1 Psycbologlat 
Om Branch Ctf f ice 
1030 Saet Green Street 
Paaadene, CA 91101 

I Office of the Chief of Maval Operations 
Beaearch Developsent & Stu^lef drench 
OP 115 

Washington, DC 20350 

I LT Frank C. Petho, KSC, UBM (Fh.D) 
CMET (N-432) 
HAS 

Pensacola, FL 32506 

I Dr. Gary Foock 

Operations Beeearch Department 
Code 55FR 

Maval Postgraduate School 
ftonterey, CA 93940 

1 Dr. Bernard lUaland (OIC) 
Mavy Feraonael B4D Center 
San Diego, CA 92152 

I Dr. Carl Boss 

CI^-FDCD 
Building 90 

Great Lakes MTC, IL 400BB 

1 Dr* Iforth Scanland 
CMET (11-5) 

MAS, Fensacola. Ft 3250*1 

1 Dr. Robert G. Salth 
Office of Chief of Maval Operations 
0F-987H 

Vashlngcon, DC 20350 

1 Dr. Richard Sorenaen 
Ma^'y PeraoMiel BAD Center 
Sao Diego, CA 92152 

1 Dr. Frederick Stelnheieer 
CMO - 0FU5 
Mavy Aonea 
Arlington, VA 20370 



1 Hr. Brad Sym^on 
Navy Personntfl il4D Center 
Sao Diego, CA 92152 

1 Dr. Frank Vlcino 
Mavy Peraonn«!l R&D Center 
San Diego. CA 92152 

1 Or. Edward WegMS 
Office of Maval Baeaarch (Code 4ll84P> 
BC^ Mortb Quincy Street 
Arlington, VA 22217 

1 Dr. Ronald Ueltjssan 
Codr 54 WZ 

Departoent of Adalnletratlve Rcleacea 
U* S. Maval Foatgraduata School 
Mtoterey, CA 93940 

1 Dr. Douglas Metael 
Code 12 

Mavy Pdrsonnel R&D Center 
San Diego, CA 92152 

1 mt. MARTIM F. WISK^F 
RAVT FgRS(H9iBL R4 0 CSMTEA 
SAM DlEOD, CA 92152 

I Hr John B. Mblfe 
Mavy Personnel R&D Center 
San Diego, CA 92152 

Marine Corps 

1 H. Willie Greenup 
Education Advisor (E03l) 
Education Center, ICOEC 
(^tlco, VA 22134 

1 Director, Office of Hanpo««er Otlllsatlo 
HQ, Marine Corps (HPU) 
BCB, Bldg. 2009 
<}uantlco, VA 22134 

I Meadquarters. U. S. Marine Corps 
Codr NFl-20 
Washington, DC 201^ 

1 Special Assistant for Merlin 
Corps Natters 
Code lOQf 

Office of Naval Research 
FOO M. Quincy St. 
ArlingtcKi, VA 22217 

t m. A.L. SLAFROSRY 
SCIENTIFIC A{WIS(M( (CCm.RD-;) 
HQ, U.S. HARIME CORFS 
IIASHIMGTt)M, DC 20380 

1 Major Frank Tohannan, OSMC 
Headquarters, Uterine Corps 
4Code HPl-20) 
Washington, DC 20360 

Ansy 

1 Technical Director 
0. S. Amy Rei^arch Institute for the 
Behaviarsl and 5;orlal Sciences 
5001 gisenhowr Avenue 
Alexandria, VA 22333 

1 Dr. Myron Flschl 
O.S. Ansy Research Institute for the 
Social and Behavioral Sciences 
5001 Elseohau^r Av^^'nue 
Alexandria, VA 22333 



37 



I Dr« mitcm S. tmtM 
tralalfig Teetalcfti 4rM 
O.8. imf Research laitltiiCt 
9001 Bi«e^icMer Avtnut 
AtttXimdrU. ?A 22133 
/ 

I J&r. Harold F. O'lHiil, Jr. 
mr«€tor» TralQlop Reaearch Lab 
Any Raaearcli loatituCa 
3001 eiafiflboiMr Avmuo 
AXttMii4rla» 22331 

I Nr. Ibobert Ratt^ 
U.S. Amy lesearch loatltutt^ for thm 
Social and S0havi<iral Scl«ac«« 
3001 Els«titiow«!r hwfim 
Alexandria. VA 22333 

I Dr. Robert Saaaor 
0. S. Aray Re««arcli Uatitatii for tba 
iobavloral aad Social ScUocao 

^ 3001 Risanliowr Avonita 
Aioxandrta, VA 22333 

I Or. Joycm Sliialda 
hnj Roaearcb Inaeltuta for (He 
Bohavlora). and Social Sclai^aa 
3001 Sl«€^towr Avenua 
. AloaaodrU, ?A 22333 

1 Or. Rllda Wing 
A)niy R8««!Arch InatlCttte 
3001 ElaenHotfpr Ave. 
AleaatHlru. VA 22333 

1 Dr. Robert Wisher 
Anay Etfuearch Xoftltutc* 
Eia«n o%rer Av^noe 

AUxaiidcia, VA 22333 



Air Force 

I AFHRL/URS 

Attn: Saoan E«fitig 
t^AFB 

tfPAFB, m 4SA33 

t Air Force Huann Resoui -z^u tab 
AFHRL/NPD 

Brooks AFB. T% 782 H 

1 O.S. Air Forca Offico of Sciantiflc 
Raa<(arch 
Life Sciencea Directorate t Rt 
Boiling Air Forc« Base 
Waabingtco, DC W)32 

i Air Univaraity library 

MiL/LSfi 76/^43 

MaiMll Art, AL 36112 
■ 

I Or. Bari A, Alluiai 
BQ. ATRRt (AF^) 
BrocdM AFB» TX 78233 

1 Kr. Raymond E. ChriatAl 
ATHRL/KOE 

Brooks AFB, TX 78233 

I Or. Alfred R« Fregly 
ATOSR/NL 

Boiling AFB. DC 20332 

I Dr. Roger Fenoall 
Air Force Risaan Roaoorcea Laboratory 
Lovry m ^230 

I Dr. Mai cola Rea 
AFWUKF 

Brooba AFB, TX 78233 



Da^rtMOt of Rafom 

12 Dafeoae T^bnical laformatlcm Centor 
Camrmi.^atioo. RI^ 3 
Alexaodris, VA 22314 
Atcnt TC 

I Dr. William Gri^ 
Taatii^ Oirectorato 
fCFaRf/MSFCTHP 
rt. i^ridaa, IL 60037 

I Jerry Latmo 

RQ mKm 

ACtfii nsKT^ 

Fort Sboridaa, it 80037 

I Military AaBlBtMt for ttelol^ *aad 
PorBoaiwi TocWlogy 
Office of tba todor Sacret«ry of Dafeaa 
for Raaearch A fitgioaarlt^ 
Rooa 3D12V, tbo Pontagoii 
Wasbl^too« DC 20301 . ^ 

I Dr. Vlayfte SoIImi 
Office of tba Aaalataot Sftcretary • 
of Dof«M« (fOA A L) 
2R269 Tha Font.tgmt 
Waabiogtoa, DC 20301 

Civil lao AgcnciOB 

1 Dr. Helen J. Chriattip 
(Xticm of FeriMMl R&D 
IfW e St., W 

Office of Faraoiml IfeMgemot 
Waabingtoo, DC 20013 

I Dr. Vem V. Drry 
Feraoosel BAD Canter 
(tffice of ^raonoel M^ageMnt 
1900 E^StrMt IRI 
ttobington, DC 20413 

I Chief, Faycho logical Reaarch Branch 
0. S. Coaat Coard {C-^-l/2/TF«) 
tfaabim^too, DC 20393 

I Nr. tbowa A. Wara 
8. 8. Coaat Guard Xaatitate 
F. 0. Suba ration 18 
OblahoM City, OK 73189 

I Dr. Jo««|Ml L« fwmii Director 
nemoTj A Cognitive Frocoaaoa 
Ratie4ial Rciooca FouBd«tion 
Was!»iiHltan, DC 20330 

Private Sector 

I Dr. Jmmu Algioa 
Univemity of Florida 
Caineavilla, FL 326 

1 Dr. Srllna B. Anderaen 
Dtpartawnt of Statiatica 

8tudl4ittraede fi 
1433 CoFei^i^on 
imiARX 

I Paycho lexical Raaearch VInit 
Oa^. of Defeaae (Army Office) 
Cm^ll Farfc Officea 
Outerra ACT 26C.C 
A08TRALXA 

i Or. Uaac Be Jar 
Bdoeational Toatlns Sarvica 
Fri^otOQ. RJ cmso 



1 Capt. J. Jmm Relangar 
training OavvlofBaiit Diviaion 
Canadian F^r»a Traioi^ ^ataa 
CFi^aq, era Trmtm 
Astra. QnUrio, KK 
CAHADA 

1 Dr. Memacba Biranbatni 
School of BdocBtion 
T^ail Aviv Onivnraity 
Twl Aviv, Rmt Adtv 89978 
^ larael 

t Dr. Wbmar Blrke 
Daiwra %m StreltkraoftaaM 
Fostfoch 20 30 03 
D-3300 Roaa 2 
U6ST WBCAm 

I Dr. R. Oarrol Bock 
Htparcmont of Bdt^aCion 
Onlveraity of Chicago . 
Oiicago, n. 6063? 

I Nr. Arael4 Dolnror 
Soctim of FRycbolag&cal RaaaarcR 
CaMroa F^tita Oiatmu 
CMS 

1000 Braaaola 
Balgiua 

I Dr. Robort Rraaoau 
teatican Onlloge tbBting Program 
P. 0. Box 168 
to«s aty, lA 32243 

I Buodnittiateriaa dar Varteidtgimg 
Hl«ferat F IX 4- 
Faycbol^iCAl Servico 
Poatfacb 1328 
D*3^ Boon 1 
F. R. of CtenaaoF 

1 Dr. Ernest R. Cadotto 
K)7 Stokely 

Oniveralty of Tanneaaee 

KnoiEville, TV 37916 

I Dr. Nonsan Cliff 
Dept. of Psychology 
Dniv. of So. California 
Oaivaralty Fai% 
Loa Aflgelaa, CA «XX)7 

1 Dr. Hans Crrabag 
Education RoMarch Center 
Univoralty of Layden 
Boerhaavelaan 2 
2334 EN Leydea 
The RETRSRLAI^S 

I Dr. Kenneth 8. Croaa 
Aaacapa Sclancea. Inc. 
F«0« Dranor Q 
8«nta Barbara » CA 93102 

1 Dr. Walter Onnninghaa 
Univ<!rsiry of Mief>i 
DepartMnt of Faychology 
Caii^svilla, FL 32611 

1 Dr. OatCpradad Di vgi 
Byractiaa Doivaraity 
DapartKont of Psycbology 
Syrscnov 33210 

I Or. Frits Dra^^ 
Departseat of Faycholi^ 
Ihiivaraity of Xllinoia 
603 B. Daniel 8t* 
Chaapaign, IL 61820 



38 



I BRIC P«cilicy-Ac^i«iticHit 
4833 Rugby ilwnuii 

I Or. &«nJ«aiR A. Falrh«iyi, Jr. 
KcF«mi-CTa^ & A0M»€lac««« Inc. 
5a25 Caii4gfi«n 
Salte 225 

S«fs Anconto, TX 78228 

I Dr. Uronar^ fmldt 
tiiid^uific CefiMT for HiMttrwnt 
Universicy of Icwv 
low Cltf » U 52242 

t Dr, Rlcti«rd I. Ferguson 
TtH! Afl^rlesii CoIl«g« t&«tlil8 ProgrM 
P.O. 8o« W 
loM City, U 52240 

I Dr. Viccor Fiolds 
Dept. of Psychology 
Nontgosery College 
gockvlllfl, m 20850 

I Univ. Prof. Or« Gorlwrd FiacMr 
Litibigg.ifiAO 5/3 
\ IQIO VleitA* 
AUSTRIA 

I Profeft94ir DunAt4 FiCKg«r«l4 
University af ^ England 
Aralddii*, Men South Walov 2151 
AUSTRALIA 

1 Dr» Ot"Klt«r Fl«?tcher 
VXCAT Recit-^rth tnsCltaCe 
1875 S« St^t¥ St. 
Orcw, UT 

I or« Jnhn R. FrederlKs^n 
Bolr Berantfk & {lewnan 
50 HoulCaa !itr»««t 
^.iuibridg«f, MA 02138 

I Or. Janice Gifford 

U{iiv«r«lty of Ma«aachuA-«it-«f 
School of Educatloii 
Aaherst. HA 01002 

I Dr. RobtTf. Cl««ar 
Uasrning Ri^ioarch A DeifolopMnt Ctfnt<)r 
University of PlttalHirgh 
3939 O'HAra Srre«c 

pirrsBimcH. pa 152^0 

i. Dr. Sort Cre^n 
Johns Hopkla* Italversicy 
OoparC«ent of Pwychology 
Charles 6 34th St rest 
BsUivore, m> 21218 

1 DH. JAMES 0, oiesiio 
liux: 

OMIVERSITY OF FlTtSBUtm 
3939 O'HARA STREET 
FITTSBWCH. PA 15213 

I Or. Rrm Mssblocon 
School of Education 
University of Nsssachus«tta 
Aaherst, MA 01002 

1 Dr. Delwyn Rarnisch 
University of XXIlnois 
242b Sducstioii 
Urbens, Ih 61801 

1 Dr. Uofd Hust^reys 
Dspartwfit of Psychology 
University of Illinois 
Chmpsign, XL 61820 



1 Dr. Jsch IHmtsr 
2122 Coolidge St. 
Unsing, Ml 489(^ 

1 Dr. Suyah Buynh 
Callege of Sducetiofi 
University of South CsroUfftS 
Coluabls. SC 29208 

1 Dr» Doagloe R. Jones 
Roe* T-255/21-T 
Bducetional Testiog Service 
Princeton, HJ (^541 

1 Professor John A. Keats 
University of Ncucastla 
N. S. «. 2^ 
aUSTCALU 

1 Dr. Scott Kelso 
teskins Laboratories* toe 
270 Crotm Street 
New Mven, CT 06510 
» 

1 CDR Robert S. Kemmdy 
Canyon Research (^roup 
1040 Woodcock Rood 
Suite 227 
Orlando, PL 32803 

1 Dr. Villiaai Ro^li 
University cf Texaa-H^tln 
Kfi«sure9ent and BvaXuatioi* C«nce^' 
Auatin, TK 78703 

1 Dr. Mlm Lasgold 
Learning RfrD Center 
University of Pittsburgh 
3939 O'Rara Street 
Plttaburgh, PA 152M 

I Dr. Michael Lovinw 
Oepart»ent of Educational Psychology 
210 Education Rldg. 
University of Illinois 
Champaign, XL 61801 

1 Dr. Charles Lairis 
FacuXtelt Socials ffetenschapsKsn 
Riiksuniversititlt Oroningon 
Oude Boteringestraet 23 
9712a: ^micgea 
Hetherlsnds 

t 

1 Or I Robert Lino 
Collage of Bducatioii 
Uaiveraity of llliy^ls 
Urbanat XL 81801 

1 Mr. Phillip Uvlngston 
Systim and AppUH Sclencea Corporat 
6811 Keolliforth Avenue 
Riverdale, W 20840 

1 Dr. Robert Locfcmn * 
Center for Hsval iMialysis 
2CK) H9Tth Baaurftg^rif St. 
AXasandria, fh 22311 

I Dr* Frederic H. Lord 

Educational Testing Service * 
Princeton t lU 06541 

1 Dr. J»es lAAsden 
DepartMnt of Psychol^y 
University of tfestern Australia 
Sedlatxds fl.A. 6009 
AUSTRALU 



1 Dr. Gary Marco 
. Stop 31 --e 

Educational Testing Service 
Prtocetoot MJ 08451 

1 Or. Scott Maamll 
Departoieat of Psychology 
finiversity of Houston 
Houston. TX 77004 

I Dr. S^i^l T. Iteyo 
Loyola University of Cliicago 
820 north Michigan Avenue 
Chicago, XL 60611 

I Mr. Robert MkRlnUy 
Aflttrican College Testing ProgrMS 
P.O. Boa 188 
XoM City, XA 52243 

I 

Professor Jason Mtllsbsn 
Department of RdiKatlon 
Stone Rail 
Cornell Qaiverslty 
Xthacs, UT 14851 

1 Dr. Robert Mislsvy 
711 Illinois Street 
Geneva, XL 60114 

1 Dr. V. Alan Micetsander 
Uoiveraity of OhlahosM 
Dttpartaeat of Psychology 
iUlahcau City, CMC 730b9 

1 Or* Melvio R. Hovick 

358 Lind^uist Center for Heasurttent 

University of Xom 

Xouai City, XA 52242 

1 Dr. Jaaes OXaon 
UXCAT. Inc. 

1875 South Static Street 
Orea, UT 84057 

1 Or. Jease Orlansky 

Inatitute for Defense AnaXyses 
1801 N. Beeur^ard St. 
Moaandris, VA 223U 

1 Wayne M. Patience 
Aaerican Council on Education 
(^D T<jstiag Service, finite 20 
One Qupont Clrle * fSi 
Washlngtc^a, DC 2003C , 

1 Or.' Ji»^s A. Paulaon 
Portland State Unlveraity 
P.O. Box 751 
Portland, OR 972U7 ^ 

1 Mr. L» Patrullo 
3695 R. Melaoo St. 
ARLlNOTi^, VA 222V 

1 Dr. Richard A. PoXlak 
f>lr#.«tor, Special Projecta 
Minnesota Educstional Coaputlng 
2520 Bro«S%iay Drive 
St. Paul.*^ 

1 Dr. Hark D. Reckase 
ACT 

P. 0. Box 168 

iQwm City, XA 52243 



Oalveriity of TeM»-0«ll«« 
MArfeotiiig DG|idirt««nt: 
p. 0« Box 688 
Eichardton, TX 7)UttO 

4 Or. 4iidra«i !!• Rofif 
AB»ric«ii Iititltutes for R*f««ri:ii 
1055 Thovas Jeffrtrtoo St. IW 
Itefthingtofi, DC 2(KK>7 . . 

I iHr. Lawrence fludn«r 

TakoM Park, ND 200U 

i Or. J. Ry*". 

0«part«ent of Eiucatloa 
OniwrBlcy of Swth Carolina 
CQlimbia. SC 292(16 , 

I Prank Schffi^t 

iVpartMot pf Psychology 

Ceorg« Wa»hlngton lloi#*r»lty 
4s«hlit3tan, OC 20052 

i Lovell Sch<>er 

Psychological & Qusntlt^tLva 

Focmdat hino 
College of Sdttcation 
tfnlvartlry of Iowa 
toita City. 14 52242 

I OH. vmmt i. siciOHL 

TMSTHWimiAL HSCHWLCICY OIOUP 

100 N, WASHUKrn)!! st; 
AUXAiemu, V4 22114 

I nf. Kflcuo Silgom:isu 
University of Tohoku 
Oeparttttsnc of Educational Paycholofy 
Kawauchi, Seodal 930 
JAPAN 

I Or* gdifln Shirkey 

OepartBcnt of Psychology 
University of Central Florid* 
Orlando t PI. 128ld 

4 Or. miliaa Sias 

Oeoter for ftoval Anal y« is 

200 North Beaureg<>ird Street 
AUxandria, VA 223 U 

1 Dr. Richard Siwv 
School of Education 
Stanford University 
Stanford, CA f4305 

I Dr. Patar Stoloff 
C4»htar for Ifaval Analysis 
200 florth Skauregard Straat 
Alexandria, VA 223U 

I Or* Villiaft St<Mit 
University of Illinoie 
riepartsenr of Hathimtics 
llrh«nA. IL 61601 



I M. PATaioc soppes 

IIVSTimg FOR HAmNATlCAL STUDIBS III 

tn SOCIAL scigm^ 

STAIffOIUI, CA 94^5 

I Dr. foriharaii^SiAfliinathan 
Lxhoratory of. Arehmatric and 
Bvalnatioa tea^rt:h 
School of Bd|^i£atiiOn 
U Ivareltw^r 4toM:hitiNitt« 
teharat, jL 01003 

1 Or. Kiku^l Tatanoka 
CoapMajj Daaod gdmatloa Betearch Uh 
2^2 Saginaarlng fUiaaarch Uboratory 
Orbm, a 61^1 

I Nuirico Tatattoka 
220 Bdoutlm tldg 
1310 8. Sixth St. 
Cha^igo, a 6IS20 

I Dr. David Thisaan 
^DeiiartMQt of Pisychology 
%ivar«lty of Kaiiaas 
Uvranco, KS 

I Dr. Sobart tautxkma 
Oeiiartswnt of Statlatiea 
University of Ktsaouri 
Columbia, m 65201 

I Or. ?. R. R« Oppolari 
Dnlon Carbide Corporation 
Ihiclaar Oivialon 
P. 0. Box T 
Qmk Ridge, 37630 

1 Or, David Vale 
Asaaarai^t Syattm Corporatioti 
2213 ttaivorsity Avetnt» 
S^itu 310 

St. Paul, in 55U4 

I Or. So«Mrd Haloar 
Oivialon of Payeholn^ical Stodlas 
Sducatlooal T^etlng Service 
Frlocetoo. 8J W540 

I Or, mcba»l f. IMior 
DepartMOt of Sdncatlonal Psycholo^ ^ 
Onivoralty of Wlacoesio-Hllliiauk^ 
MilMukae, VI 43201 

I Or. Brian tfaters 

100 north tf ashing ton 
Alexandria, VA 22314 

I E«. Gmmm tlELTRAII 
PdUIBPTROIiXCS - 
6271 VAEIBL AVE. 
WXmJM iilLUS« CA 91367 

I jmSAS E. UHItSLT 
PSYCflCH/XSY DgPAfCniBNT 

tniivmm OF icAgaAS 

Lavrimca, K8 66045 

i Dr. Raod R. Vlicox 
Onlversity of Southern California 
OsFartsient of Fsycholi^y 
Ua Aageles, CA 90007 



-^l Ifolfgai^ Mldgri^ 
StraltksMftant 
Box 20 SO 03 
O-5300 Bona 2 
VEST GratABT 

I Or, Bntca OlUlaaa 
Dapartamt of BdneatloMi Poychelogy 
(toiveraUsr of XlllooU 
Drbww, IL 6IWI 

I Dr. 1«»tidy In 
mfndhmm trill 
t^l Hoata Rsaaarch Park 



ERIC 



40 



Previous Publications 



Proceedings of the 1977 Computerised Adaptive Testing Conference. 
July 1978, 

Proceedings of the 1979 CmaputerlEed Adaptive Testing Conference* 
September 1980. 

Research Reports 

83-2. Bias and Inforaatlon of Bayeslan Adaptive Testing. March 1983. 

83-1. itellablllty axd Validity of Adaptive sad Convratlonal Tests In a Wlltary 

Recruit Population* January 1983. 
81-5. Dlaenslonallty of Measured Achlevesent Over Tlae. Deceaber 1981. 
81-4. Factors Influencing the B^ychoiMtTlG Characteristics of an Adaptive 

Testing Strategy for Test Batteries* Noveaber 1981* 
81-3. A Validity Comparison of Adaptive and Conventional Strategies for Itestery 

Testing. Septembear 1981. 
Final Report: Computerised Adaptive Ability Testing April 1981* 
81-2. Effects of In^edlate Feedback ami Facing of ItCT Presentation on Ability 

Test Performance and Psychological Reactions to Testihg. February 1981* 
81-1* Review of Test Theory and Methods. January 1981. 
80-5. An Altetnate-Forms Reliability and Coiu:urrent Validity Gomparlson of 

Bayeslan Adaptive and Conveitlonal Ability Tests. December 1980. 
80-4. A Comparison of Adaptive, Sequential, and Conventional Testing Strategies 

for Mastery Decisions. November 1980. 
80-3. Criterion-Related Validity of Adaptive Testing Strategies, ^ne 1980. 
80-2. Interactive Computer Administration 4>f a Spatial Reasoning Test. April 

1980. 

Final Report: Computerised Adaptive ^rformance Evaluation* February 1980. 
80-1. Effects of Immediate Knowledge of Results on Achievement Test Performance 

and Test Dimensionality. January 1980. 
79-7. Tile Person Response Curve: Fit of Ii^lvlduals to Item Characteristjlc Curve 

Models. December 1979. 
79-6. Efficiency of an Adaptive Inter-Subtest Branching Strategy \a the 

Measuremoit of Classro(» Achievement. November 1979* 
79-5. An Adaptive Testing Strategy for Mastery Decisions.. Steptember 1979.. 
79-4. Effect of Polnt-ln-Tlme in Instruction on the Measurement of Achleveneit. 

August 1979. ^ . 

79-3. Relationships 8moi% Achievement Level Estimates from Three Item 

Characteristic Curvp Scoring Methods. April 1979. 
Final Report: Bias-Free Computerised Testing. March 1979. 
79-2. Effects of Computerised Adaptive Testing on Black /and White ^udents. 

March 1979. 

79-1. Computer Programs for Scoring Test Data with Item Characteristic Curve 

Nodelb. February 1979. 
78-5. An Item Bias Investigation of a Standardised Aptitude Test. December 1978. 
78-4. A Construct Valldatloa of Adaptive Achievoaeiit' TastlQg. Hovember 1978. 
78-3. A Comparison of Levels and Dimensions of Performance in Black and VAilte 

Groups on Tests of Vocabulary, Mathematics, and Spatial Ability. 

October 1978. 

-continued Inside- 



41 



Previous Publications (Continued) 



78-2. The Effects of Knowledge of Results and Test Difficulty on Ability Test 

Perfonsance and Fls^hologlcal Reactions to Testing. September 1976. 
78-1. A Ckmparison of the R&lrness of Adaptive and Conventional Testing 

Strategies. August 197$. 
77-7. An Info nation Comparison of Oonvoitlonal and Adaptive Tests in the 

Measurement of Classrofm Achlewemait. October 1977. 
77-6. An Adaptive Testing Strategy for Achievement Test Batteries. October 1977. 
77-S. Calibration of an Itea Bool for the Adaptive Measureaeait of AchieveBcnt. 

V September 1977. ' 
77-^. A Rapid Item-Search Procedure for Bayeslan Adaptive Testing. May 1977. 
77-3- Accuracy of Perceived Test-Item Difficulties. May 1977, 
77-2. A Comparison of Information Functions of Multlple-<niolce and Free-Response 

Vocabulary Itma, April 1977. 
77-1. Applications of Oomputerised Adaptive Testing. March 1977. 

Final Report: Cooqiuterized Ability Testing, 1972-1975. April 1976. 
76-5. Effects of Item Characteristics on Test foirness. Deceaber 1976. 
76-4. PsychoIogl(||I Effects of LBmediate Knowledge of Results. ami Adaptive 

Ability Testing. June 1976. 
^6-3. Effects of Immediate Knowledge of Restdts and Adaptive Testing on Ability 

Test Performance. June 1976. 
76-2. Effects of Time Limits on Test-Taking Behavior. April 1976- s,. 
76-1. Scrae Prop^ties of a Bayeslan Adaptive Ability Testing Strategy. March 

1976. 

73-6. A Simulation Stuiy of Stradaptive Ability Testing. December 1975. 
73-5. Computerized Adaptive Trait Ifeasuremait: ProbloBS and Prospects. November 
1975. 

73-^. A %udy of Computer-Administered Str«laptlve Ability Testing. October 
1975. 

73-3. Empirical and Simulation Studies of Flexilevel Ability Testing. July 1975. 
73-2. TETREST: A FORTRAN IV Program for Calculating Tetrachoric Correlations. 
March 1975. 

73- 1. An Empirical Cosspa risen of TWo-Stage and Pyramidal Adaptive Ability 

Testing. February 1975. 

74- 5. Strategies of Adaptive Ability Measurement. December 1974. 
74-4. Simulation Studies of Two-Stage Ability Testii«. October 1974. 
74-3. An Empirical Investigation of Ccmputer-Administered Pyrmldal Ability 

Testing. July 1974. 
74-2. A Word Knowledge Item Pool for Adaptive Ability Measurement. June 1974. 
74-1. A Computer Software Systes for Adaptive Ability Measurement. January 1974. 
73-4. An Empirical Study of Computer-Administered TWo-Stage Ability Testing. 

October 1973. 

73-3. The Stratified Adaptive Computerized Ability. Test. September 1973. 
73-2. Comparison of Four Empirical Item Scoring Procedures. Ai;^ust 1973. 
73-1. Ability Measurement: Conventional or Adaptive? February 1973. 

Copies of these reports are available, while supplies last, from: 
Computerized Adaptive Testing Laboratory 
N660 Elliott Hall 
University of Minnesota 
75 East River Road 
Minneapolis MN 55455 U.S.A. 



ERIC 



42 



